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GLOSSARY  OF  SYMBOLS 


t 

denotes  a decision-maker 

i 

denotes  an  alternative 

zti 

vector  of  variables  characterizing  decision-maker  t and 
alternative  i 

T 

population  of  decision-makers 

C 

choice  set 

zt 

matrix  consisting  of  vectors  zfc^  for  all  ieC 

Zo 

collection  of  attribute  matrices  zt  faced  by  all  decision 
makers  in  T 

Z 

complete  attribute  space,  of  which  ZQ  is  a subset 

f (i,z) 

joint  generalized  probability  density  of  i and  z in  population 

P(i | z) 

the  choice  model  predicting  probability  i is  chosen  given  z 

p(z) 

generalized  probability  density  of  z in  population 

e* 

a vector  of  unknown  parameters 

e 

an  estimate  of  9* 

p(z) 

the  generalized  probability  density  of  z after  a policy  change 

0(i) 

expected  fraction  of  population  choosing  alternative  i 

Q(i) 

the  expected  fraction  of  population  choosing  i after  a policy 
change 

( CxZ)b 

the  b-th  subset  of  CxZ 

B 

the  number  of  subsets  of  CxZ 

Hb 

the  fraction  of  the  sample  drawn  from  the  b-th  subset  of  CxZ 

H 

the  vector  (H^,  ...  ...  Hg) 

N 

the  total  sample  size 

Nb 

the  number  of  observations  drawn  from  the  b-th  subset  of  CxZ 
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Tb 

the  sub-population  of  T in  the  b-th  subset  of  CxZ 

(1n  ’ Zn) 

an  observation  from  the  population  corresponding  to  the  n-th 
observation  drawn  from  the  b-th  subset  of  CxZ 

Fb 

the  fraction  of  the  population  who  are  members  of 

F 

the  vector  (F-^  F2,  ...  Ffe , ...  Fg) 

L 

the  likelihood  of  a stratified  sample 

Lr 

the  likelihood  of  a random  sample 

Le 

the  likelihood  of  an  exogenous  sample 

L 

c 

the  likelihood  of  a choice-based  sample 

Zb 

the  b-th  subset  of  Z defined  for  an  exogenous  sample 

Cb 

the  b-th  subset  of  C defined  for  a choice-based  sample 

g(z) 

the  sample  distribution  of  z in  an  exogenous  sample 

y(z) 

a function  mapping  old  attribute  values  into  new  ones 

zk 

the  k-th  entry  in  the  attribute  vector  z 

Q(i) 

an  estimate  of  Q(i)  from  a sample  enumeration 

*(z) 

cumulative  distribution  of  attributes  in  the  population 

^(z|b) 

cumulative  distribution  of  attributes  in  b-th  subpopulation 

5(z|b) 

empirical  distribution  in  subsample  drawn  from  (CxZ)g 

x(z) 

any  vector  valued  function  of  the  attribute  vector  z 

E(x) 

the  expected  value  of  x 

E (x | b) 

the  expected  value  of  x in  the  b-th  subpopulation 

|c| 

the  number  of  alternatives  in  the  choice  set 

zm 

the  population  median  for  z 

z 

o 

any  particular  value  of  z 

Y 

the  dependent  variable  in  a linear  model 

B 

a vector  of  parameters  in  a linear  model 

Vll 


£ 

a disturbance  term 

V(E) 

variance  of  e 

2 

a 

an  unknown  scalar  multiplier  of  variance  of  £ in  the  linear 
model 

G 

2 

a known  matrix,  where  V(e)  = a G in  the  linear  model 

D 

a function  which  is  strictly  increasing  in  each  of  its  arguments 

Zi~Zj 

the  difference  between  the  attribute  vectors  for  the  i-th  and 
j-th  alternative 

(J)* 

a vector  of  parameters  with  same  number  of  entries  as  z-j_-zj 

Y*i . Y*j 

alternative  specific  constants  for  the  i-th  and  j-th  alternatives 
respectively 

Ui 

the  random  utility  of  the  i-th  alternative 

I 

an  identity  matrix  of  dimension  | C | — 1 

V(z) 

the  expected  sampling  variance  of  attributes  across  alternatives 

Zij 

Zi  ' zj 

Z1 

the  post-experimental  range  of  z 

a subset  of  0*,  the  parameter  vector 

F(i,z) 

the  post-experimental  distribution  of  z in  the  population. 
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EXECUTIVE  SUMMARY 


In  the  past  five  to  ten  years,  significant  advances  have  been  made 
in  the  development  of  discrete  choice  models  for  travel  demand  analysis. 
Discrete  choice  models  represent  the  choices  of  individuals  among  alter- 
natives such  as  modes  of  travel,  auto  types  and  destinations.  As  these 
models  (such  as  multinomial  logit)  continue  to  be  applied  in  practice, 
there  is  a growing  need  for  a coherent  theory  of  how  data  should  be 
used  in  discrete  choice  analysis  and  for  practical  guidance  in  the  collection 
of  such  data. 

The  problems  associated  with  designing  samples  are  exemplified  by 
the  Urban  Mass  Transportation  Administration's  Service  and  Methods 
Demonstration  Program.  Under  this  program,  changes  in  the  transporta- 
tion services  provided  in  an  urban  area  are  made,  and  the  resulting 
shifts  in  level  of  service  and  user  response  are  monitored.  Data  is 
collected  in  such  experiments  in  order  to  evaluate  the  impacts  of  the 
changes  and  to  generalize  the  results  to  other  situations.  The  problem 
of  how  to  collect  useful  data  in  a cost-effective  manner  is  critical  in 
such  evaluation  efforts. 

This  paper  is  an  effort  to  synthesize  the  state  of  the  art  in 
sample  design  for  one  major  aspect  of  evaluating  transportation  system  change, 
traveller  behavior.  The  paper  focuses  on  discrete  choice  analysis  of  tra- 
vellers' decisions.  It  incorporates  recent  published  and  unpublished  theo- 
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retical  findings  as  well  as  some  new  results.  In  addition,  it  addresses 
the  practical  concerns  which  arise  in  designing  and  using  data  samples 
in  travel  demand  analysis. 

A basic  assumption  of  discrete  choice  analysis  is  that  within  any 
population,  it  is  possible  to  characterize  any  individual  by  both  a 
list  of  attributes  such  as  income,  auto  ownership,  travel 
time,  and  costs  by  various  modes,  etc.,  and  an  actual  choice 
of  a discrete  alternative.  Throughout  the  paper,  it  is  presumed  that  the 
primary  motivation  for  sampling  is  to  learn  something  about  the  charac- 
teristics of  the  population,  the  attributes  of  the  alternatives  indivi- 
duals face,  and  the  choices  they  make.  A central  hypothesis  in  discrete 
choice  analysis  is  that  there  is  a causal  link,  in  which  the  probability 
of  each  individual  choosing  any  particular  alternative  (termed  a choice 
probability)  depends  on  his/her  attributes;  changes  in  the  attributes 
will  therefore  change  the  choice  probabilities. 

Given  these  assumptions,  the  goal  of  any  particular  data  collection 
scheme  is  fairly  clear.  The  analyst  seeks  to  learn  about  (1)  the 
distribution  of  attributes  in  the  population,  and  (2)  the  choice  proba- 
bilities for  the  population.  For  example,  the  most  traditional  approach 
of  survey  data  collection  for  transportation  planning  has  been  the  home 
interview  survey.  This  provides  estimates  of  the  distribution  of  attributes 
such  as  income,  auto  ownership,  household  size,  age,  sex,  race,  etc., 
in  the  metropolitan  area.  This  data,  along  with  level  of  service  estimates 
(typically  derived  from  skim  trees),  provides  a relatively  complete 
estimate  of  the  distribution  of  attributes  in  the  population.  This  data  can 
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then  also  be  used  to  infer  choice  probabilities  for  the  population  using 
models  such  as  multinomial  logit,  in  which  the  probability  of  any  member 
of  the  population  selecting  an  alternative  (e.g.  driving  alone,  carpooling,  and 
using  transit)  depends  on  the  attribute  values. 

The  home  interview  survey,  however,  is  just  one  of  a number  of 
sampling  strategies  of  potential  use  in  inferring  both  the  distribution  of 
attributes  and  choice  probabilities.  Given  this,  the  paper  considers 
four  interrelated  questions; 

1.  What  different  sampling  strategies  for  discrete  choice  analysis 
exist? 

2.  How  can  different  sampling  strategies  be  used  to  estimate  the 
distribution  of  attributes  in  the  population? 

3.  How  can  different  sampling  strategies  be  used  to  estimate 
choice  probabilities? 

4.  What  is  the  role  of  experimentation  in  improving  travel  demand 
analysis  ? 

Each  of  these  questions  is  considered  below. 

1.  What  different  sampling  strategies  exist? 

The  review  deals  with  a very  broad  class  of  sampling  strategies  (termed 
stratified  sampling,  which  includes  as  special  cases  random  sampling,  exogeneous 
sampling  and  choice-based  sampling.  In  stratified  sampling,  the  data 
sample  is  assumed  to  be  obtained  by  the  following  four  steps: 

a)  Divide  the  entire  population  into  groups  based  on  both  their 
attributes  and  the  decisions  made. 

b)  Choose  how  many  people  are  to  be  sampled  from  each  group. 

c)  Within  each  group,  sample  the  preset  number  of  people  at  random. 

d)  For  each  person,  observe  his/her  attributes  and  the  choice  he/she 
made . 

It  is  important  to  note  that  the  strata  into  which  the  population  is 
divided  can  be  defined  by  both  attributes  and  choices.  In  mode  choice. 
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for  example,  one  stratification  might  be  to  split  the  population  into 
high  and  low  income  travellers;  another  might  be  to  define  transit  users 
and  highway  users  as  distinct  groups.  Stratifications  based  on  combina- 
tions of  attributes  and  choices  are  also  feasible. 

The  first,  and  most  widely  used  special  case  of  general  stratified 
sampling  is  random  sampling, in  which  the  entire  population  is  a single 
stratum.  The  second  is  exogenous  sampling,  (e.g.,  home  interview 
surveys),  in  which  the  stratification  is  based  solely  on  attributes,  not 
on  actual  choices.  The  third  is  choice-based  sampling  (e.g.,  on-board  sur- 
veys) , in  which  the  stratification  is  based  on  choices  but  not  attribu- 
tes . 

It  is  important  to  stress  that  in  stratified  sampling,  one  can 
control  two  items,  the  definition  of  the  strata,  and  the  size  of  the 
sample  within  each  stratum.  The  analyst  does  not  control  which  decision- 
makers are  actually  sampled  in  each  stratum,  since  these  are  drawn  at 
random. 

2.  How  can  different  sampling  strategies  be  used  to  infer  the  distribution 
of  attributes  in  the  population? 

There  are  two  general  approaches  to  using  stratified  samples  to 
estimate  the  distribution  of  attributes  in  the  population.  The  simple, 
less  general  method  is  to  constrain  the  sample  design  such  that 
the  fraction  of  observations  in  each  stratum  equals  the  corresponding 
population  fraction.  In  this  case,  the  resulting  stratified  sample  can 
be  used  as  "representative"  of  the  population. 

The  second  more  general  approach  involves  use  of  a simple 
probability  statement  to  solve  for  the  population  attribute  distribution 
as  a weighted  sum  of  the  attribute  distributions  within  the  strata.  The 
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weights  in  this  case  are  the  fraction  of  the  population  in  each  stratum. 

In  both  these  approaches,  the  analyst  must  know  the  share  of 
population  in  each  stratum.  At  least  four  approaches  to  determining  these 
shares  are  available. 

a)  Use  existing  data  sources  which  yield  direct  information  on  strata 
shares  (such  as  the  census  for  geographical-based  stratifications) 

b)  Use  a random  sample  (such  as  a telephone  survey)  to  estimate  the 
share  of  the  population  in  each  stratum. 

c)  Use  published  statistics  and  solve  a set  of  linear  equations 
derived  from  probability  theory  for  population  shares. 

d)  Estimate  the  population  fractions  simultaneously  with  the  choice 
model . 

Each  of  these  techniques  has  both  strengths  and  weaknesses  described 
in  the  paper,  but  some  may  not  be  applicable  to  all  situations. 

3.  How  can  different  sampling  strategies  be  used  to  estimate  the  choice 

probabilities? 

In  almost  all  cases  of  practical  interest,  the  problem  of  determining 
choice  probabilities  reduces  to  one  of  estimating,  or  calibrating,  the 
parameters  of  some  model.  For  example,  in  mode  choice  analysis,  the  choice 
probabilities  might  be  represented  by  the  multinomial  logit  model,  and 
the  coefficients  of  the  model  would  have  to  be  estimated. 

A number  of  significant  theoretical  advances  have  been  made  in  this 
area.  Perhaps  the  most  significant  conclusion  is  that,  under  certain 
technical  restrictions,  any  stratified  sample  can  be  used  to  estimate 
discrete  choice  models.  This  includes  random,  exogenous,  choice-based, 
and  mixed  sample  designs. 

Given  that  any  stratified  sample  may  be  used,  a related  question 
is  how  to  select  the  "best"  sample  design.  Major  observations  in  this 
area  include  the  following: 
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a)  The  existing  literature  on  sample  design  is  based  on  the  classical 
criterion  of  minimizing  the  variance  of  parameter  estimates. 

b)  Mathematical  results  have  been  difficult  to  achieve  and  are  still 
limited . 

c)  Prior  information  about  the  shares  of  the  population  in  the  strata 
and  the  distribution  of  attributes  in  the  population  can  improve 
parameter  estimates. 

d)  The  "best"  sample  design  (by  the  classical  criterion)  depends  on 
the  true  parameter  values,  which  are  obviously  unknown  a priori. 

This  contrasts  with  hhe  standard  linear  regression  model,  in  which 
an  optimal  sample  design  can  be  easily  derived  and  does  not  depend 
on  the  actual  model  parameters. 

e)  Limited  Monte  Carlo  tests  suggest  that  for  binary  choice  models 
estimated  with  choice-based  samples,  it  is  advantageous  to  make 
sample  shares  close  to  1/2,  and  that  prior  knowledge  of  the  share 
of  the  population  choosing  each  alternative  is  very  valuable. 

f)  If  one  wants  to  choose  a sample  to  test  the  hypothesis  that  a 
particular  model  as  a whole  is  more  informative  than  a model  in 
which  the  choice  probabilities  for  every  individual  equal  the 
corresponding  population  shares,  the  best  sample  includes  decision- 
makers facing  widely  disparate  alternatives.  This  result  does 

not  apply  to  samples  designed  to  test  other  hypotheses. 

4.  What  is  the  role  of  experimentation  in  improving  travel  demand  forecasts? 

A significant  problem  in  discrete  choice  analysis  is  that  under  some 
conditions  it  is  impossible  to  estimate  certain  parameters.  This  situation 
(termed  non-identif  icatiorj) , can  arise  in  four  important  ways: 

a)  The  alternatives  everyone  faces  are  homogeneous  along  some  attribute. 
For  example,  in  analyzing  taxi  users'  choice  of  which  cab  company 
to  call  for  service,  it  would  be  impossible  to  determine  the  effect 
of  fare  differences  (with  corresponding  variations  in  service 
quality)  across  taxi  operators.  Due  to  local  regulatory  policy, 
all  companies  provide  roughly  homogeneous  service  at  the  same  price. 


b)An  attribute  of  a particular  alternative  is  constant  for  all  decision- 
makers. For  example,  in  most  cities,  transit  systems  do  not  offer 
any  demand  responsive  services,  while  the  auto  mode  is  by  its  very 
nature  demand  responsive.  In  this  case,  it  would  be  impossible  to 
estimate  mode  choice  probabilities  for  route  deviation  bus  service. 
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c)  Two  attributes  are  perfectly  correlated.  For  example,  in  many 
small  cities  where  congestion  is  minimal,  taxi  fare  (based  on 
meters)  may  for  all  practical  purposes  be  perfectly  correlated 
with  in-vehicle  travel  time.  Car  times  and  operating  costs  would 
be  similarly  correlated.  Thus,  it  would  be  impossible  to  dis- 
tinguish between  the  effects  of  time  and  cost  in  a choice  model. 

d)  Alternatives  are  unavailable.  For  example,  some  cities  do  not 
have  any  transit  service  at  all,  and  the  demand  for  such  service 
would  be  impossible  to  determine. 

It  is  important  to  point  out  that  carefully  thought  out  experiments 
can  be  used  to  create  situations  in  which  previously  unestimable  para- 
meters of  discrete  choice  models  can  be  estimated.  Many  of  the  current 
Service  and  Methods  Demonstration  projects  serve  precisely  this  function. 

By  changing  current  attributes  for  a small  group  within  a larger  population 
and  estimating  a discrete  choice  model,  it  is  then  feasible  to  forecast 
how  the  entire  population  will  respond  to  area-wide  implementation. 

Experiments  can  also  provide  a way  to  achieve  greater  confidence  in 
parameter  estimates.  In  many  situations,  some  parameters  are  technically 
identified,  but  the  amount  of  variation  in  the  data  is  too  low  to  make 
precise  parameter  estimates  feasible. 


Some  Practical  Considerations 

All  of  the  above  discussion  is  based  on  theoretical  results.  It  is 
important  to  emphasize  that  given  the  current  state  of  the  art,  there  is 
no  general  rule  for  selecting  the  best  sample  design  for  discrete  choice 
analysis  problems.  In  fact,  given  the  analytic  intractability  of  many  of 
the  sample  design  problems,  it  is  unlikely  that  a rule  for  optimal  sample 
design  will  be  found  in  the  near  future.  However,  some  general  practical 
guidelines  can  be  proposed: 
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a)  It  is  obvious  that  the  first  concern  in  designing  a sample 

must  be  to  assure  that  a model  can  be  estimated  at  all  (i.e.,  that 
the  model  parameters  are  identified  in  the  sample). 

b)  The  duration  of  experiments  must  be  carefully  considered.  Exist- 
ing discrete  choice  models  are  for  the  most  part  static  in  structure, 
and  any  dynamic  effects  in  the  period  between  implementation  and 
response  cannot  be  reflected. 

c)  The  classical  statistical  framework  should  not  be  applied  dogmati- 
cally. In  most  cases,  the  analyst  has  more  information  than  classi- 
cal statistical  analysis  presumes.  For  example,  in  most  cases, 
some  a priori  statement  about  the  sign  and  magnitude  of  certain 
parameters  will  be  possible.  This  information  should  be  used 

if  only  in  an  intuitive  way. 

d)  A particular  sample  may  be  used  for  estimation  of  both  the  attri- 
bute distribution  and  the  choice  probabilities.  A "good"  sample 
design  must  balance  these  uses. 

e)  In  many  cases,  the  idealized  stratified  sample  may  be  difficult 

to  obtain.  In  particular,  stratified  sampling  requires  the  ability 
to  identify  the  stratum  to  which  a decision  maker  belongs  and  the 
ability  to  draw  at  random  from  each  stratum.  The  last  requirement 
is  often  violated  in  common  survey  practices  such  as  roadside  interviews, 
on-board  surveys,  and  mailback  questionnaires. 

It  is  important  to  note  that  many  of  the  results  reported  here  are 
quite  recent,  and  that  further  work  will  undoubtedly  resolve  some  of  the 
questions  raised  in  the  report.  Discrete  choice  analysis  is  still  a quickly 
growing  area  of  knowledge,  and  further  work  on  sample  design  problems 
will  hopefully  make  more  precise  statements  about  alternative  sampling 
strategies  possible.  In  particular,  further  work  in  classical  sample  de- 
sign analysis,  non-classical  sample  design  criteria  (e.g.,  use  of  Bayesian 
analysis),  further  Monte  Carlo  studies,  and  a broader  base  of  actual  ex- 
perience with  different  stratified  sampling  rules  should  yield  greater 
insight  into  sample  design  for  discrete  choice  analysis. 
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1.  INTRODUCTION 


This  paper  summarizes  recent  advances  in  the  theory  of  sample  design  for 
discrete  choice  analysis  and  discusses  the  relevance  of  these  advances  for 
the  practice  of  travel  demand  forecasting.  The  objective  of  the  summary  is 
to  provide  a general  framework  for  analyzing  existing  data,  and  designing  new 
samples . 

The  issues  that  arise  in  designing  useful,  cost-effective  samples  are 
exemplified  by  the  Urban  Mass  Transportation  Administration's  Service  and 
Methods  Demonstration  Program.  The  types  of  transportation  system  changes 
introduced  under  this  program  influence  both  the  performance  of  the  transpor- 
tation system  and  the  population's  response  to  that  system.  In  evaluating 
such  projects,  a relatively  large  base  of  data  must  be  collected,  and  the 
cost  and  accuracy  of  the  resulting  samples  may  critically  influence  the 
success  or  failure  of  the  entire  evaluation  effort. 

Moreover,  in  many  demonstrations,  the  impact  of  the  transportation  sys- 
tem change  is  confounded  with  the  effect  of  other,  exogenous  changes  which 
occur  over  the  demonstration  period.  This  is  particularly  the  case  when  the 
population's  response  to  the  system  change  is  being  measured.  For  this 
reason,  the  data  collection  strategy  must  often  provide  an  adequate  base  to 
support  a multivariate  analysis  of  traveller  response.  Discrete  choice  ana- 
lysis has  enormous  potential  for  providing  evaluations  of  population  response 
in  such  situations. 

In  developing  this  review,  an  effort  is  made  to  recognize  the  distinc- 
tion between  the  idealizations  imposed  by  formal  theory,  and  the  real  world 
issues  arising  in  practice.  Discrete  choice  models  rest  on  a set  of  assump- 
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tions  about  the  population  and  its  behavior.  In  interpreting  the  theory 
of  sample  design,  one  must  keep  in  mind  that  this  theory  relates  to  the 
idealized  world  of  formal  analysis.  The  application  of  sample  design  theory 
to  actual  travel  demand  forecasting  problems  is,  if  anything,  an  art. 

Our  first  task,  undertaken  in  Section  2,  will  be  set  to  out 
the  idealized  probability  model  assumed  in  formal  discrete  choice  analysis. 
The  analyst  will  usually  attempt  to  specify  a model  which  accurately 
represents  the  actual  population  of  interest,  but  at  the  same  time  is 
simple  enough  to  provide  a useful  tool  for  forecasting.  In  most  applied 
contexts,  however,  the  analyst's  knowledge  of  the  actual  population  will 
not  suffice  to  totally  specify  a satisfactory  probability  model  and  its 
parameters  a priori.  From  this,  the  purpose  of  data  collection 
emerges,  that  is  to  allow  one  to  learn  more  about  the  population  and 
consequently  to  improve  one's  ability  to  forecast  how  that  population 
will  respond  to  transportation  system  changes. 

Our  second  task,  addressed  in  Section  3,  will  be  to  describe 
some  of  the  alternative  sampling  rules  that  can  be  used  in  data  collection. 
Attention  will  be  focused  on  rules  in  which  the  population  is  stratified  in 
some  way,  and  observations  are  then  drawn  at  random  within  stratifications. 
This  wide  class  of  sample  designs  includes  almost  all  currently  used 
methods  in  transportation  planning.  For  example,  a home  interview  survey 
is  typically  performed  by  sampling  randomly  from  the  entire  relevant  popula- 
tion; on-board  surveys  are  random  samples  from  the  stratum  of  transit  users. 

For  the  purposes  of  this  review,  it  is  useful  to  separate  the  travel 
demand  analysis  process  into  two  phases.  First,  there  is  a population 
description  phase  in  which  one  formally  characterizes  the  decision  making 
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population  and  the  travel  alternatives  its  members  face.  This  character- 


ization can  include  distributions  of  socioeconomic  attributes  such  as 
income,  auto  ownership,  or  household  size,  as  well  as  level  of  service 
variables  such  as  time  and  cost.  Second,  there  is  a choice  modelling 
phase  in  which  one  specifies  a model  of  travel  behavior  (e.g.  logit  or 
probit),  and  estimates  its  unknown  parameters.  A need  for  data  samples 
may  appear  in  both  of  the  phases.  Sections  4 and  5 respectively,  con- 
sider in  detail  the  sample  design  problem  that  arises  in  the  two  phases 
of  travel  demand  analysis.  In  these  sections  the  known  theoretical 
results  on  sample  design  are  collected,  and  those  respects  in  which  sampling 
must  remain  an  art  are  articulated. 

Usually  the  travel  demand  forecasting  process  draws  its  data  from 
observations  of  travel  behavior  under  whatever  travel  environment 
happens  to  prevail  at  the  time  of  data  collection.  Sometimes,  however, 
the  existing  travel  environment  does  not  contain  a range  of  attributes 
sufficiently  varied  to  permit  inference  of  travel  behavior  to  proceed. 

In  such  circumstances,  an  additional  phase  may  usefully  be  added  to  the 
analysis  process.  This  is  an  experimentation  phase  in  which  the  existing 
travel  environment  of  a subset  of  the  population  is  artif ically  modified 
so  as  to  create  the  variation  in  travel  alternatives  needed  to  support 
behavioral  modelling.  In  Section  6,  we  examine  issues  in  the  design  of 

such  experiments  and  their  role  in  the  forecasting  process.  Directions 
for  future  research  are  indicated  in  Section  7. 

The  theoretical  results  on  sample  design  reported  in  this  paper 
are  drawn  from  a number  of  sources.  In  particular,  we  draw  heavily  from 
Lerman,  Manski  and  Atherton  (1975),  Manski  and  Lerman  (1977),  Manski  and 
McFadden  (1977),  and  Cosslett  (1977).  Some  of  the  work  presented  here  is 
new  and  has  not  previously  been  reported. 
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2.  PROBABILITY  MODEL 


The  probability  model  underlying  modern  discrete  choice 
analysis  of  travel  behavior  has  been  laid  out  in  a general  form  in 
Manski  and  McFadden  (1977)  and  is  summarized  here. 

It  is  assumed  that  an  idealized  decision  making  population,  repre- 
senting the  actual  population  of  interest,  has  been  defined.  Each 
member  of  this  idealized  population  faces  a common,  finite  set  of 
travel  alternatives.  Let  T designate  the  population  and  C the  choice 
set.  With  each  decision  maker  teT  and  alternative  i£C  there  is 

associated  a vector  z . which  characterizes  the  decision  maker  and 

ti 

the  alternative.  Let  z^  = (z^,  ieC)  be  the  matrix  of  attributes  charac- 
terizing decision  maker  t's  choice  set  and  let  Zo=(z^_,  teT)  be  the  collection 
of  attribute  matrices  faced  by  the  various  decision  makers  in  T.  Finally, 

let  Z denote  the  attribute  space,  in  which  Z is  a subset  of  Z.  That  is, 

o 

Z is  a collection  of  attribute  matrices  including  at  least  those  currently 

faced  by  decision  makers.  Usually,  Z will  be  defined  so  as  to  encompass 

attributes  that  might  be  found  among  the  population  in  the  future  as  well 

as  those  included  in  Z . ^ 

o 

In  actual  applications,  the  definition  of  T (the  decision-making  popu- 
lation), C (the  choice  set)  and  Z (the  attribute  space)  varies  considerably. 

For  example,  in  an  analysis  of  mode  choice  for  work  trips,  T may  include 
all  workers  travelling  on  a particular  day  to  or  from  their  place  of  employ- 
ment. The  choice  set  modes  such  as  driving  alone,  transit,  and  carpooling.  The 
attribute  space  may  include  times,  costs,  etc.,  for  each  mode,  socioeconomic 
characteristics  such  as  income  and  auto  ownership,  as  well  as  functions  of  both  these 

^An  integral  part  of  the  sample  design  problem  is  to  decide  what  attributes  of 
decision  makers  and  alternatives  should  be  obtained  in  the  data  collection 
process.  Thus,  the  structure  of  the  attribute  space  Z is  under  the  potential 
control  of  the  analyst.  This  aspect  of  sample  design  will  not  be  discussed  in 
this  paper.  Instead  it  will  be  assumed  that  a structure  for  Z has  somehow  been 
chosen. 
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types  of  attributes. 

Using  this  notation,  the  basic  probabilistic  assumption  is  that  the 
frequency  distribution  of  choices  ( i )»  and  attribute  matrices  (z)  in  the 
actual  population  can  be  characterized  by  a generalized  probability  density 

(1)  f(i,  z)  = P(i|z)  p(z) 


defined  over  C x Z. 


In  discrete  choice  analysis,  the  decomposition  of  the  joint  density 
f(i,z)  into  the  product  of  the  conditional  probability  P(i|z)  and  the 
marginal  density  p(z)  is  of  particular  importance.  In  discrete  choice 
analysis,  P(i|z)  is  not  simply  a conditional  probability;  it  is  rather 
the  probabilistic  prediction  of  a behavioral  model  describing  how  a 
decision  maker  with  associated  attributes  z would  select  among  the 
alternatives  in  C.  For  the  above  reason,  P(i|z)  is  often  termed  a 
"choice  probability".  In  most  applications,  the  behavioral  model  generating 
P(i|z)  is  a priori  specified  by  the  analyst  to  be  a member  of  a para- 


metric family,  implying  that  P(i|z)  itself  is  known  up  to  this  family.  For 
example,  one  might  assume  that  the  choice  probabilities  have  the  conditional 
logit  form,  where 


P(i | z) 


ti 


I e 

jeC 


z .0* 
ti 


where  0*  is  a vector  of  unknown  parameters.  In  this  case,  the  choice 
probability  may  be  written  as  P(i|z,  0*). 

Consider  now  four  basic  assumptions  of  the  above,  general  model. 
One  seemingly  restrictive  assumption,  namely  that  all  members  of  T face 
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the  same  choice  set  C,  is  actually  innocuous.  To  see  this,  observe  that 
the  attributes  can  vary  with  each  individual  t.  Also,  observe  that  if 
some  alternative  j is  "unavailable"  to  a decision  maker  t,  this  fact  can 
be  reflected  in  the  value  taken  by  z^,  and  we  can  set  P(j|z)  = 0 in  this  case. 
For  example,  the  alternative  of  driving  alone  is  for  obvious  reasons 
generally  assumed  to  be  unavailable  to  travellers  without  access  to  an 
automobile.  This  can  be  reflected  in  the  choice  model  P(j|z),  where 
j = driving  alone,  by  defining  one  attribute  in  z to  be  auto  availability, 
and  defining  P(j|z)  = 0 when  this  attribute  is  zero. ^ 

In  most  applications,  the  decision  making  population  of  interest 
is  relatively  large.  It  then  becomes  analytically  convenient,  and 
basically  innocuous,  to  let  the  idealized  population  T be  infinite  so 
distinctions  between  sampling  with  and  without 

replacement  can  be  ignored.  With  this  second  assumption  of  T possibly 

infinite,  it  is  natural  to  characterize  the  distribution  of  attributes 

2 

in  the  population  by  a generalized  probability  density  p(z). 

As  Manski  and  McFadden  (1977)  emphasize,  the  application  of  discrete 
choice  analysis  does  require  one  crucial  assumption  not  imposed  in  the 
general  statistical  analysis  of  discrete  data.  This  is  the  postulate 
(typically  derived  from  some  behavioral  theory)  that  the  probability 
P(i|z)  reflects  a "causal"  link  between  the  independent  variables  z and 
the  choice  of  any  alternative  ieC.  Moreover,  it  is  implicit  in  the  use 
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At  the  extreme,  some  alternative  may  be  currently  available  to  no  decision 
maker  in  T.  Nevertheless,  it  would  still  be  desirable  to  formally  include 
such  an  alternative  within  the  choice  set  if  the  alternative  might  become 
available  in  the  future. 


2 -A  gen e ra 1 i ze d~ pr ob ability  density  is  simply  a mixture  of  a probability 
distribution  assigning  positive  probability  to  a finite  set  o points 
in  Z and  an  ordinary  probability  density  function  over  Z.  This  assump 
tion,  like  that  of  infinite  T,  can  generally  be  accepted  without  concern. 
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of  the  model  to  make  forecasts  that  this  link  will  continue  to  hold  even 
if  the  joint  distribution  of  (i,z)  pairs  in  the  population  is  changed. 

This  third  assumption  provides  the  basis  for  the  use  of  discrete 
choice  analysis  as  a tool  in  travel  demand  forecasting.  Under  the 
behavioral  postulate,  changes  in  transportation  policy  may  modify 
people's  travel  environments,  as  expressed  in  the  attribute  distribution 
p(z),  but  do  not  change  their  behavior,  as  expressed  in  the  choice 
probabilities  P(i|z).  Thus,  if  the  choice  probabilities  are  known  to  the 
analyst,  and  if  the  effect  of  a policy  change  on  the  attribute  distribu- 
tion can  be  determined,  the  effect  of  that  policy  change  on  travel 
choices  can  be  predicted. 

For  example,  assume  that  the  initial  attribute  distribution  is  p(z), 
that  a proposed  policy  change  will  modify  it  to  p(z),  and  that  we  wish 
to  predict  the  effect  of  the  policy  change  on  the  fraction  of  the  popula- 
tion making  various  travel  choices.  In  more  concrete  terms,  p(z)  could 
include  the  distribution  of  travel  cost  in  the  population  before  some 
pricing  change,  and  p(z)  would  be  the  same  distribution  after  the  change. 
For  any  alternative  i£C,  the  expected  fraction  initally  choosing  i is,  by 
definition,  Q(i)  = P(i|z)  p(z)  dz.  Given  the  behavorial  postulate,  the 
predicted  post-policy  fraction  is: 

Q (i)  = /z  P (i | z)  p(z)  dz. 

A fourth  assumption  implicit  in  discrete  choice  analysis  as  currently 
practiced  deserves  comment.  The  probability  model  we  have  set  out  is 
static,  that  is,  it  describes  the  population's  travel  environment  and 
behavior  in  a manner  which  ignores  time.  Recently,  researchers  have  begun 
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to  develop  discrete  choice  models  which  explain  sequences  of  choices  over 
time,  and  attempt  to  realistically  incorporate  dynamic  aspects  of 
behavior?"  Because  research  on  dynamic  choice  analysis  is  still  in  its 
infancy,  and  because  no  corresponding  literature  on  sample  design  has  yet  de- 
veloped, this  paper  confines  its  attention  to  the  less  general  but 
quite  rich  models  of  static  choice  analysis. 

With  the  probability  model  of  discrete  choice  analysis  now  laid 
out  and  interpreted,  the  formal  objectives  of  data  collection  can  now 
be  expressed.  These  are  first  to  learn  the  form  of  the  attribute  dis- 
tribution p(z),  and  second  to  learn  the  choice  probabilities  P(i|z). 

Where,  as  is  usual,  the  choice  probabilities  are  a priori  given  the 
parametric  form  P(i|z,  0A) , the  second  objective  reduces  to  one  of 
estimating  0*.  The  design  of  samples  meeting  the  above  objectives 
will  be  examined  as  soon  as  we  have,  in  the  next  section,  introduced 
a class  of  sampling  rules  suitable  for  investigation. 


1 

See  for  example,  Heckman  (1977). 
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3.  CLASS  OF  STRATIFIED  SAMPLING  RULES 


Within  the  general  model  developed  above,  one  can  view  the  process  of 
sampling  as  drawing  observations  on  individuals  (as  described  by  their  attri- 
butes z)  and  their  respective  choices  i£C.  Observations  are  (i,z)  pairs, 
from  which  the  probabilities  p(z)  and  P(i|z)  may  be  learned.  For  example, 
in  a typical  application,  an  observation  might  consist  of  a traveller  with 
a known  mode  choice  among  carpooling,  driving  alone,  and  transit,  along  with 
associated  attributes  such  as  times  and  costs  on  each  mode,  as  well  as  the 
traveller's  socioeconomic  characteristics. 

In  describing  alternative  sampling  rules,  we  shall  first  present  a 
relatively  abstract  theoretical  development,  and  then  illustrate  that  theory 
with  a brief  example.  Finally,  we  shall  consider  the  most  useful  practical 
cases  within  the  class  of  rules  to  be  considered. 

The  existing  theoretical  literature  on  sample  design  in  discrete  choice 
analysis  essentially  assumes  that  such  observations  are  drawn  in  the  follow- 
ing manner  : ^ 

First,  the  analyst  partitions  the  set  C x Z,  consisting  of  all  possible 
choice-attribute  pairs,  into  a collection  of  B mutually  exclusive  and  ex- 
haustive subsets  (C  x Z)^,  b = 1,....B.  Such  a partitioning  is  conventionally 
termed  a "stratification"  of  C x Z. 

Second,  the  analyst  selects  a set  of  sampling  fractions 


(H  , b = 1, 
b 

Then , 


. . . B)  such 
for  each  b=l 


V ^ 

that  l H = 1,  and  a sample  size  N. 

b=l 

,...B,  a total  of  • N decision  makers  are  inde- 


pendently drawn,  at  random,  from  T^ , the  sub-population  of  T defined  by 

T,  = ( teT  : (i  . z ) e (C  x Z)  , ). 
b t t b 

See  Manski  and  McFadden  (1977),  for  a formal  presentation. 

2 

The  sampling  fractions  may  be  set  directly  or  may  themselves  be  determined  by 
an  auxiliary  exogenous  process.  In  the  former  case,  we  speak  of  a "single  stage" 
stratification;  in  the  latter  case,  a "multi-stage"  one.  Cluster  sampling  is  one 
type  of  multi-stage  process. 
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Finally,  for  each  sampled  decision  maker,  the  associated  choice- 

attribute  pair  is  observed.  Thus,  a sample  (i  , z ),  n = 1 , . . . N , b = 1,...B 

n n b 

of  such  pairs  is  produced. 

Consider  again  the  example  of  a simple  mode  choice  model  in  which  tra- 
vellers choose  among  ride  sharing,  driving  alone  and  carpooling.  Furthermore, 
assume  that  time,  cost,  income  and  auto  ownership  are  the  only  relevant  vari- 
ables in  the  model.  In  this  example,  the  set  C x Z would  be  all  feasible 
combinations  of  modes  and  their  attributes. 

In  Figure  1,  the  modes  in  C are  rows  and  the  attributes  Z are  columns. 

Display  of  C x Z 


1 

It  would  be  notationally  more  proper  to  write  (ih^>  Zn^)  * n^  = 1 , . . , b = l,...B. 

For  simplicity,  the  subscript  b on  n is  omitted. 
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Thus,  Z consists  of  eight  separate  components,  seven  of  which  are  for  all 
practical  purposes  continuous  and  one  of  which,  auto  ownership,  is  discrete. 
Any  particular  observation  of  an  (i , z)  pair  in  C x Z would  consist  of  the  mode 
chosen  also  the  eight  modal  and  socioeconomic  attributes  of  the  sampled  indi- 
vidual . 

The  simplest  possible  stratification  of  C x Z would  be  to  define  only 
one  stratum  consisting  of  all  of  C x Z.  The  corresponding  sample  would 
be  drawn  randomly  from  this  single  strata. 

A second  stratification  rule  might  define  three  income  groups,  as  follows 
(C  x Z)  i = all  (i,z)  pairs  with  income  $7500 

(C  x Z)^  = all  (i,z)  pairs  with  income  between  $7500  and  $15,000 
(C  x Z)  = all  (i,z)  pairs  with  income  > $15,000 
Still  a third  possible  stratification  might  be  to  define  modal  users  groups, 
such  as: 

(C  x Z)  = all  carpool  users 
(C  x Z)  ^ = all  drive  alone  users 
(C  x Z)^  = all  transit  users. 
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Further  examples  can  include  mixtures  of  the  above  such  as  follows: 

(C  x Z) ^ = all  transit  users  with  income  .<  $7500 

(C  x Z)^  = everyone  else 

In  each  of  these  examples,  B would  equal  the  number  of  strata,  and  , b = 1, 

B would  be  the  relevant  subsets  of  the  population. 

The  class  of  stratified  sampling  rules  in  general  and  the  most  relevant 
special  cases  in  particular  offer  an  enormous  range  of  sample  design  pos- 
sibilities to  the  analyst.  Of  course,  many  interesting  rules  lie  outside  the 
stratified  class.  Some  attention  has  recently  been  given  to  samples  created 
by  mixing  stratified  sub-samples  of  different  types.  In  particular,  the  so- 
called  "enriched"  samples,  in  which  a sample  of  users  of  one  alternative  in  C and  a 
random  sample  are  combined,  have  been  studied  by  McFadden  (1977), and  Cosslett 
(1978a). 

There  may  in  some  circumstances  be  advantages  to  using  sampling  strate- 
gies in  which  the  sample  size  is  determined  as  part  of,  rather  than  prior  to 
the  actual  data  collection.  While  sampling  with  so-called  "informative 
stopping  rules"  has  been  analyzed  in  many  statistical  contexts  (in  particular, 
see  DeGroot  (1970)),  no  research  on  the  use  of  such  rules  in  discrete  choice 
analysis  has  been  performed. 

The  discussion  of  sample  design  in  this  paper  concerns  itself 
exclusively  with  sampling  rules  of  the  stratified  class.  Before 
introducing  those  special  cases  of  this  class, which  have  been  found  most 
useful  for  applications,  three  general  observations  are  in  order. 

First,  it  is  important  to  distinguish  what  aspects  of  the  sampling  pro- 
cess the  analyst  does  and  does  not  control.  What  he  does  control  is  the 
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stratification  (C  x Z)  , b = 1 , . . . B and  the  number  of  decision  makers  N 

b b 

to  be  drawn  from  each  sub— population  T . What  he  does  not  control  are  the 

b 

identities  of  the  decision  makers  then  drawn.  These  drawings  are  to  be  in- 
dependent and  at  random.  Thus,  the  sample  likelihood  is: 


B Nb  f(i  , z ) 

(2)  L = tt  tt  — 

n=l 


* «b 


b= 1 " * ‘b 

where  is  the  fraction  of  the  population  T who  are  members  of  T^.  To  see 
that  equation  (2)  is  the  sample  likelihood,  consider  the  likelihood  of  any 
observation  drawn  via  a stratified  sampling  rule.  This  is  the  probability 
that  the  stratum  (C  x Z)^  containing  the  observation  is  selected,  times  the 
conditional  likelihood  of  drawing  the  observed  (i,z)  pair  out  of  this  stra- 
tum. The  former  probability  is  the  sampling  fraction  . The  latter  con- 
ditional likelihood  is  F • Since  observations  are  drawn  independently, 

Fb 

equation  (2)  follows.  We  note  for  later  use  that  the  population  fraction 

F can  be  expressed  as  an  integral  of  the  joint  distribution  f(i,z),  over  the 
b 

subset  (C  x Z),  , that  is: 
b 


Fb  = (C  x Z)/f(i’z>d(i’z> 
b 


Second,  one  should  understand  why  samples  produced  by  stratified 

sampling  rules  yield  information  about  P(i|z)  and  p(z).  The  reason,  very 

simply,  is  that  the  sample  likelihood  (2)  is  a function  of  the  density  values 

f(i  ,z  ),  n= 1 , . . . N.  , b=l,...B,  and  of  the  population  fractions  F,  , b=l,...B; 
n n b b 

these  values  are,  of  course,  functions  of  P and  p through  equation  (1).  Note 
that  the  sample  likelihood  also  depends  on  the  stratification  imposed,  on  the 
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sampling  fraction  H^,  b=l,...B,  and  on  the  sample  size  N.  The  sample  design 
theory  discussed  in  this  paper  can  usefully  be  viewed  as  the  study  of  how  these 

control  variables  should  be  chosein  so  as  to  yield  likelihoods  with  desirable 
properties . 

Our  third  observation  is  an  operational  one.  In  order  to  apply  a strati- 
fied sampling  rule,  the  analyst  must  have  a viable  procedure  for  sampling  at 
random  within  each  of  the  sub-populations  T^,  b = This  requirement 

is  sometimes  difficult  to  meet,  either  because  there  are  problems  in  effect- 
ively separating  the  various  sub-populations  from  one  another,  or  because  a 
suitable  mechanism  for  selecting  decision  makers  independently  at  random  is 
elusive.  These  operational  concerns  in  sampling  will  be  discussed  further 
in  Section  4.1.1. 

Among  the  class  of  all  stratified  sampling  rules,  three  types  are  of 
particular  applied  interest.  These  are: 


3.1.  1.  Random  Sampling:  The  stratification  of  C x Z is  the  trivial  case  in  which 

the  entire  population  is  a single  stratum.  In  this  case, 

B = 1 and  (C  x Z) ^ = C x Z. 

Then  F = / f (i,z)d(i,z)  = 1,  and  H = 1,  so  the  sample  likelihood  (2)  reduces 
CxZ  1 


to 


(3) 


N 

= TT 
n=l 


P<VZ„>  p(z„>- 


3.1.2.  Exogenous  Sampling:  The  analyst  partitions  the  attribute  space  Z into 

mutually  exclusive  and  exhaustive  subsets  Z,  , b = 1,...B>  and  lets  (C  x Z),  = 

b ■ b 

C x Z^.  That  is,  the  pair  (i,z)  is  included  in  stratum  (C  x Z)^  if  and  only  if 

z£Z,  and  the  identity  of  i is  not  used  in  defining  the  stratum.  Then 
b 

F = / f(i,z)d(i,z)  = / P (z)dz  and  the  sample  likelihood  becomes: 
b C x Zk  Zb 

D 


14 


P (i  |z  )*t>(z  )YL 
n n n b 


B Nb 

(4)  L = it  if 
e b= 1 n= 1 


/ p(z)  dz 

% 


This  case  corresponds  to  the  example  of  income  stratification  given  above, 

where  the  subsets  Z are  the  three  income  classes  and  choice  of  mode  does  not 
b 

affect  the  stratum  to  which  a member  of  the  population  belongs. 


3.1.3.  Choice  Based  Sampling:  The  analyst  partitions  the  choice  set  C into 

mutually  exclusive  and  exhaustive  subsets  C^,  b = 1,...B  and  lets 

(C  x Z)  = C x Z.  That  is,  (i,z)  belongs  to  (CxZ),  if  and  only  if  i£Cb.  Then 
b b 

F = / l f (i,z)d(i,z)  = / ( l P(i | z) ) p(z)dz  and  the  resulting 
b Z ieCb  Z ieCb 


sample  likelihood  is: 

B Nb  P(in|zn}  P(zn} 

(5)  L = TT  TT *H  * 

C b= 1 n= 1 / ( l P (i | z) )p (z)  dz 

Z ieC, 
b 

This  case  corresponds  to  the  example  of  stratification  by  modal  user  groups 
given  above,  in  which  attributes  of  the  modes  or  decision-makers  do  not  enter 
into  the  strata  definitions. 

Let  us  examine  these  three  types  of  sampling  rules.  Random  sampling,  the 
simplest,  is  a fully  specified  rule.  That  is,  once  the  analyst  is  committed 
to  random  sampling,  he  exercises  no  further  control  over  the  data  col- 
lection process. 

In  exogenous  sampling,  the  analyst  , through  stratification  of  Z and 

selection  of  the  sample  fractions  , partially  controls  the  sample  attribute 

distribution  p (z) 

/ p (z) dz  b 


Let  g(z)  designate  this  sample  distribution.  Then  the  exogenous  sampling  like 
lihood  can  be  written  in  the  familiar  form 
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(6)  L 

e 


B 

TV 

b=l 


N. 

77  P ( 1 I ^ ) g ( Z ) 

i n n n 


It  has  been  common,  although  not  strictly  correct,  to  assert  that  under  exo- 
genous sampling,  the  analyst  has  full  control  over  the  sample  distribution 
g(z),  and  to  represent  the  sample  design  problem  as  one  of  selecting  among 
alternatives  such  as  distribution.^  As  the  above  indicates,  however,  exo- 
genous sampling  really  offers  more  limited  design  possibilities,  in  that  only 
and  the  stratification  can  be  controlled,  not  the  entire  distribution  g(z) . 

In  particular,  the  drawing  of  observations  out  of  each  stratum  is  done  at 
random,  and  hence  is  not  under  the  analyst's  control. 

Note  that  the  exogenous  sampling  likelihood  reduces  to  the  random  sampling 

one  if  the  analyst  sets  H = F,  = /„  p(z)  dz,  all  b = 1,  . . .B  or,  alternatively, 

b b Zb 


samples  so  that  the  density  of  g will  be  g(z)  = p(z),  for  all  zeZ.  Finally, 
we  remark  that  the  transportation  home  interview  survey  is  often  cited  as  an 
example  of  exogenous  sampling.  While  this  example  is  often  apt,  it  is  not 
always  proper.  In  particular,  if  the  choice  being  analyzed  is  that  of  resi- 
dential location,  then  the  geographic  stratification  used  in  home  interview 
surveys  is  choice  based  rather  than  exogenous. 

Choice  based  sampling  rules  give  the  analyst  control  over  the  frequencies 
with  which  the  various  alternatives  in  C appear  in  the  sample.  The  most  re- 
fined form  of  choice  based  sampling  is  that  in  which  each  alternative  in  C 
defines  a separate  stratum.  In  this  case,  B is  the  choice  set  size  and  b=l, 

...B  indexes  the  alternatives  in  C.  For  this  stratification,  the  choice  based 
sampling  likelihood  may  be  written  as  follows: 

Ni  P(i  z )p(z  ) 

(7)  T = TT  TT  3 .H-j. 

c i£c  n=l  /ZP(i  z)p(z)dz 

The  choice  based  sampling  likelihood,  like  the  exogenous  sampling  one,  reduces 


See,  for  example,  Lerman,  Manski  and  Atherton  (1975) 
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to  that  of  random  sampling  in  special  cases.  Specifically,  this  occurs  when- 


ever 

Hb=Fb  =/  ( J P (i | z))  p (z)dz  all  b = 

Z i£C, 
b 

On  board  and  roadside  surveys  are  often  cited  as  examples  of  choice  based 
sampling  in  transportation.  As  indicated  before,  however,  even  home  interview 
surveys  may  in  some  contexts  be  choice  based. 

It  is  of  interest  to  observe  that  the  sample  likelihood  associated  with 
random  sampling  may  be  achieved  using  other  stratified  sampling  rules  as  well. 
In  particular,  it  is  easy  to  see  that  if  the  analyst  choose  any  stratifica- 
tion and  sample  composition  such  that  H,  - F,  , all  b=l,...B»  then  the  general 

b b 

stratified  sampling  likelihood  (2)  reduces  to  the  random  sampling  one  (3) • 

Since  the  sample  likelihood  embodies  all  information  in  the  data  sample,  all 

stratified  rules  satisfying  the  vector  condition  H = F where  H=(H,  ,b=l , . . . B) 

b 

and  F = (F^ ,b= 1 , . . . B)  are  statistically  equivalent. 
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4.  SAMPLE  DESIGN  FOR  DESCRIPTION  OF  ATTRIBUTE  DISTRIBUTION 


The  preliminaries  for  our  discussion  of  sample  design  in  discrete 
choice  analysis  have  now  been  completed  and  we  turn  in  this  section  to 
the  problem  of  learning  the  attribute  distribution  characterized  by  the 
density  p(z).^  Organizationally,  it  is  convenient  to  first  treat  this  prob- 
lem within  the  idealized  world  of  theory,  then  address  the  practical 
concerns  that  arise  from  incompleteness  of  the  theory  and  divergences 
between  the  idealized  and  real  worlds.  This  same  sequence  of  presentation 
will  be  followed  in  the  next  section  when  the  problem  of  learning  the  choice 
probabilities  is  discussed. 

4.1  THEORETICAL  RESULTS 

Let  us  first  recall  the  reason  why  we  should  like  to  know  the  attribute 
distribution,  and  clarify  the  sense  in  which  this  distribution  is  to  be 
learned. 

The  travel  demand  forecasting  process  requires  knowledge  of  the  present 
attribute  distribution,  in  order  to  determine  the  distribution  that  would  pre- 
vail after  a hypothesized  policy  or  environmental  shift  has  occurred.  Let  us 
be  precise.  In  discrete  choice  analysis,  a policy  shift  is  simply  a function 
changing  each  decision-maker's  present  attribute  value  into  some  new  value. 

If  p(z)  is  the  current  attribute  density  and  if  y(z)  is  the  function  mapping 
old  into  new  attribute  values,  then  clearly  p and  y together  determine  p, 

the  post-policy  attribute  density.  For  example,  the  attribute  vector  z might 
til 

include  as  its  k entry,  z^,  the  transit  fare,  which  might  be  25q  currently. 

^A  technical  note  is  required  here.  The  distribution  of  attributes  in  the  pop- 
ulation may  be  expressed  through  a cumulative  distribtution  function  or  through 
its  derivative,  the  probability  density  function.  While  the  likelihood  of  an 
observation  is  defined  in  terms  of  the  density  function,  it  turns  out  that  for 
travel  demand  analysis,  it  is  the  distribution  function,  not  the  density,  that 
must  be  learned.  See  Section  4.1.1  for  further  elaboration  of  this  point. 
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If  the  only  policy  change  being  evaluated  were  a doubling  of  fare,  then 

y(z1  ) would  reduce  to  y,  = 2z  , and  the  corresponding  post-policy  distribution 

K.  K.  K. 

of  the  entire  attribute  vector,  p(y),  could  readily  be  derived. 

Once  p and  the  (time  invariant)  choice  probabilities  P(i|z)  are  known, 
all  consequences  of  the  policy  shift  can  be  determined.  For  example,  the 

new  expected  share  of  the  population  choosing  alternative  i is  defined  by 

Q(i)  = /P(i| z)p(z)dz 

Z 

Because  the  attribute  distribution  simply  describes  the  existing  travel 
environment  and  is  not  derived  from  any  causal  model,  it  is  generally  assumed 
in  discrete  choice  analysis  that  one  knows  little,  if  anything,  about  the  form 
of  p ( z ) a priori.  In  particular,  unlike  the  behaviorally  derived  choice  prob- 
abilities, the  attribute  density  is  usually  not  specified  to  be  a member  of 
any  parametric  family.  Thus,  learning  the  attribute  distribution  means 
learning  the  whole  distribution  function,  not  merely  some  parameters  charac- 
terizing this  function. 1 

How  then  may  we  learn  the  attribute  distribution  be  learned.  Two  ap- 
proaches, both  of  whitth  are  correct  in  theory  and  useful  in  practice  shall  be  discussed. 

A. 1.1.  The  Representative  Sample  Approach 

The  simpler  but  less  powerful  of  the  two  approaches  is  as  follows. 

Select  a sampling  rule  such  that  the  likelihood  of  observing  any  attribute 
value  z on  each  draw  is  p(z).  Then  draw  an  actual  sample  of  decision-makers 
according  to  this  rule.  Use  the  resulting  distribution  of  z values  in  the 
sample  as  an  estimate  of  the  attribute  distribution  in  the  population. 

Note  that  it  will  not, in  general,  be  sufficient  to  learn  lower  order  moments 
of  p(z),  say  its  mean  and  covariance  matrix.  The  values  we  should  like  to 
forecast,  such  as  Q(i),  depend  on  the  entire  density  p and  hence  on  all  of  p. 
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The  theoretical  basis  for  the  above  procedure  is  the  fact  that  the  em- 
pirical distribution  of  z in  the  sample  will  in  general  converge  to  the 
population  distribution  as  the  sample  size  increases. ^ Hence,  in  large  enough 
samples,  the  sample  distribution  of  z will  be  appropriately  "close"  to  the 
population  distribution.  The  procedure  is  useful  in  practice  because  once 
the  sample  is  drawn,  the  analyst  can  simply  treat  the  sample  of  decision- 
makers as  if  it  were  the  whole  population,  and  produce  forecasts  using  this 
sample.  This  is  the  so-called  "random  sample  enumeration"  method  of  fore- 
casting. To  see  how  this  works,  consider  the  problem  of  forecasting  the 
expected  share  of  the  population  choosing  some  alternative  i after  a policy 

change.  This  share,  it  will  be  recalled,  is  Q(i)  = / P(i|z)  p(z)  dz . Given 

l N z 

a sample  of  N individuals,  Q(i)  = — £ P(i|zn).  This  estimate  will  be  con- 

n=l 

sistent  as  long  as  the  sampling  rule  used  satisfies  the  property  described 
above . ^ 

We  now  must  specify  sampling  rules  which  do  have  the  desired  property 

that  the  likelihood  of  observing  z is  p(z). 

Consider  the  set  of  stratified  sampling  rules  in  which  = F , all  b=l, 

...B.  These  are  rules  for  which  the  fraction  of  the  sample  in  each  stratum 

equals  the  share  of  the  population  in  that  stratum.  All  rules  meeting  the 

condition,  it  will  be  recalled,  yield  the  random  sampling  likelihood  in  which 

the  likelihood  of  any  (i,z)  observation  is  P(i|z)p(z).  Clearly,  the  marginal 

likelihood  of  any  z observation  is  J P(i|z)p(z)  = p(z).  Hence,  all  strati- 

ieC 

fied  rules  satisfying  the  H=F  conditions  are  appropriate  for  learning  p(z). 

Since  the  set  of  sampling  rules  satisfying  the  H = F conditions  are  sta- 

^See  Rao,  Section  6f.l,  1973. 

2 

Note  that  construction  of  the  estimate  Q(i)  only  requires  the  empirical  dis- 
tribution of  sample  points.  It  does  not  require  one  to  estimate  the  density 
p(z)  . 
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tistically  equivalent,  the  analyst's  choice  among  such  rules  can,  with  one 
strong  caveat,  be  based  on  considerations  of  relative  sampling  costs.  The 
caveat  is  that,  given  any  stratification  , the  sampling  fraction  in  stra- 
tum b is  known  and  under  the  control  of  the  analyst,  but  F , the  share  of 

b 

the  population  in  stratum  b,  may  not  be  known.  Thus,  to  devise  a sample 

with  H = F,  we  must  select  a stratification  for  which  the  values  of  F,  are 

b 

a priori  known.  This  is  certainly  a non-trivial  requirement,  particularly 
as  the  F values  are  themselves  generally  functions  of  the  unknown  p (z)  den- 
sity. (The  one  circumstance  in  which  F is  trivially  known,  is  the  case  of 
random  sampling  where  B=1  and  F^=l).  Nevertheless,  the  requirement  can 
often  be  met  in  practice.  This  point  will  be  discussed  at  length  in  Section 
4.2.2. 

4.1.2  Representative  Sub-Sample  Approach 

It  is  significant  that  the  sampling  rules  satisfying  the  H = F condi- 
tion are  not  the  only  ones  from  which  the  attribute  distribution  may  be 
learned.  In  fact,  any  stratified  rule  can  be  used  as  long  as  is  made 
positive  whenever  F^  is  positive.  To  see  this,  let  (C  x Z)^,b=l,...B  be 
an  arbitrary  stratification,  let  'f(z)  be  the  cumulative  attribute  distri- 
bution in  T,  and  let  'F ( z | b ) designate  the  cumulative  attribute  distribution 

among  decision  makers  in  stratum  b.  Consider  the  identity 
B 

(8)  'F(z)  = l Y(z|b)-F 

b=l 

and  observe  that  in  stratified  sampling,  N,  observations  are  drawn  at  random 

b 

from  T . The  empirical  cumulative  distribution  of  z in  these  N observations, 
b b 

A 

designated  'f(zlb),  is  therefore  a consistent  estimate  of  the  sub-population 

B 

A 

attribute  distribution  ’f(z|b).  From  (8),  it  then  follows  that  £ yCzIb^F  is 

b=l 

a consistent  estimate  of  the  population  attribute  distribution  ^Cz) . 
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What  this  result  implies  is  that  consistent  estimates  of  the  distri- 
bution of  socioeconomic  characteristics — such  as  income  and  auto  ownership, 
as  well  as  level  of  service  variables — such  as  time  and  cost,  can  be  recovered 
from  stratified  samples  as  long  as  the  population  shares  in  each  stratum  are 
known.  For  example,  suppose  one  had  a roadside  interview,  and  an  on-board 
survey  (choice-based  samples  for  the  car  and  transit  modes  respectively) . 

Using  the  data  from  these  surveys  as  empirical  distributions,  and  a priori 
knowledge  of  mode  shares,  equation  (8)  can  be  used  to  estimate  the  attribute 
distribution  for  the  entire  population. 

To  see  how  the  above  approach  may  be  used  in  practice,  consider  again 

the  problem  of  forecasting  the  post-policy  aggregate  share  Q(i).  Observe 
B 

that  Q(i)  = (i | b ) *F,  where  Q(i|b)  is  the  aggregate  share  choosing  i among 

b=l 

the  sub-population  T Note  that  Q(i|b)  may  be  consistently  estimated  by 

N,  B 

ft(i|b)  = — T P(i |z  ).  Henc^  Q(i)  may  be  estimated  by  Q(i)  = £ §(i|b)»F 

Nb  n=l  " b-1  b 

Four  remarks  should  be  made  about  the  above  procedure.  First, 

given  any  stratification,  prior  knowledge  of  the  F values  is  required  to 

implement  the  procedure.  Second,  random  sampling  falls  within  the  class 

of  procedures  as  we  may  simply  set  B=l.  Third,  as  long  as  the  shares  of 

the  population  in  each  stratum  are  known,  the  use  of  the  above  procedure 

does  not  require  knowledge  of  the  choice  process;  it  is  based  on  a 

simple  probability  identity  which  does  not  involve  p(i|z).  Fourth,  for 

B A 

a given  total  sample  size,  the  £ T(z|b)-  F^  estimates  resulting  from 

b=l 

different  stratifications  and  sample  compositions  are  not,  in  general, 
statistically  equivalent. ^ Unfortunately,  there  exists  very  little 
theory  to  help  one  select  among  alternative  designs.  We  shall,  however, 
offer  some  heuristic  guidance  on  this  question. 

^This  point  is  discussed  in  Section  4.2.3, 
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4.2  PRACTICAL  CONCERNS 


The  foregoing  theoretical  discussion  of  sampling  to  learn 
the  attribute  distribution  is  incomplete  as  a basis  for  selecting 
sample  designs  in  practice.  First,  we  have  as  yet  said  nothing  about 
the  feasibility  of  implementing  stratified  sampling  rules . Second , 
given  any  imp lemen table  rule,  we  have  not  indicated  how  the  require- 
ment for  prior  knowledge  of  the  F values,  necessary  for  estimation 
of  the  attribute  distribution,  can  be  met.  Third,  we  have  thus  far  offered  no 
assistance  to  the  analyst  in  selecting  among  those  rules  which  are  implement- 
able  and  whose  associated  F values  can  be  determined.  These  practical  concerns 
in  sample  design  are  addressed  below. 

4.2.1  truplem  of  Implementation^ 

Given  a population  T,  the  process  of  selecting  a stratifi- 
cation T,  , b=l , . . .B  and  then  sampling  at  random  within  each 
b 

T^  appears  deceptively  simple.  Actually,  implementation  of  this 
process  always  requires  careful  thought  and  often  some  theoretical 
compromise . The  reason  is  that  in  order  to  sample  at  random  within 
a sub-population,  one  must  first  be  able  to  isolate  this  sub- 
population for  purpose  of  sampling.  In  practice,  such  isolation 
is  sometimes  difficult  to  achieve. 

Two  examples  will  serve  to  illustrate  the  point . Let  the 
population  of  interest  be  the  set  of  all  people  potentially 
making  trips  within  a metropolitan  area . First  consider  strat- 
ification based  on  place  of  residence.  A home  interview  survey 
can  easily  isolate  and  sample  from  the  sub -population  of  potential 

The  discussion  in  this  section  pertains  to  the  problem  of  estimating  the 
choice  probabilities  as  well  as  to  that  of  estimating  the  attribute 
distribution. 
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trip-makers  who  are  residents  of  the  area.  It  is  however  far 
more  difficult  to  isolate  and  sample  from  the  remaining  sub- 
population, namely  non-residents. 

Now  consider  the  same  population  and  let  the  stratification 
be  based  on  the  mode  used  in  trip-making  on  a given  day.  Transit 
users  will  be  relatively  easy  to  isolate  because  of  their  physi- 
cal proximity  in  transit  vehicles  and  stations.  Automobile  users 
will  generally  be  more  difficult  to  isolate  as  a group.  Theq,  of 
course,  there  is  the  sub-population  who  make  no  trip  on  the  given 
day.  Non— trip  making  residents  may  be  isolated  through  a home 
interview  survey  on  that  day.  How  non-trip  making  non-residents 
can  be  sampled  is  not  clear. 

The  above  examples  are  fairly  typical  of  the  practical  difficulties 
that  may  arise  in  isolating  sub-populations.  There  do  exist  some 
situations  in  which  no  practical  way  can  be  found  to  sample  from 
some  sub-population.  The  non-trip  making  non-residents  in  the 
above  examples  may  be  such  a case.  In  these  situations,  the  analyst 
may  do  one  of  two  things,  neither  very  palatable.  First,  he  may 
ignore  the  problematic  sub-population,  that  is  define  the  population  T 
so  as  to  exclude  it.  Second,  he  may  assume  that  the  attribute  distribu- 
tion in  this  sub-population  is  identical  to  that  in  some  "similar" 
sub-population  which  can  be  sampled. 

One  further  warning  should  be  given  to  conclude  this  discussion. 
When  a means  of  isolating  and  sampling  from  a sub-population  has  been 
found,  care  must  still  be  taken  to  ensure  that  the  sample  is  drawn  at 
random.  How  this  essential  requirement  can  be  satisfied  in  practice 
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must  be  determined  on  a case-by-case  basis,  but  some  potential  problems 
can  at  least  be  highlighted. 

Consider,  for  example,  the  following  three  surveys: 

L an  on-board  survey  of  passengers  on  selected  bus  routes; 

2.  a roadside  interview  at  various  points  in  the  city; 

3.  a mailback  survey  sent  to  a random  selection  of  households. 

In  the  first  survey,  the  relevant  sub-population  might  be  all  transit 
riders.  However,  the  need  to  choose  which  routes  to  survey  makes  achieve- 
ment of  random  drawings  of  transit  users  difficult.  Some  routes  may 
have  a high  percentage  of  elderly  users,  while  others  may  attract 
primarily  workers.  Furthermore,  if  a sample  is  taken  on  a single  day, 
some  transit  users  may  be  interviewed  more  than  once,  and  such  individuals 
are  likely  to  have  very  different  characteristics  than  the  rest  of  the 
sub -population. 

The  same  problems  arise  in  the  second  example,  where  the  objective 
would  presumably  to  be  to  draw  randomly  from  all  auto  users. 

In  the  third  example,  the  high  rejection  rate  generally  associated 
with  mailback  surveys  makes  attainment  of  random  drawing  extremely 
difficult.  It  is  often  unlikely  that  people  who  choose  to  respond 
to  mailback  questionnaires  have  the  same  attribute  distribution  as  the 
population  as  a whole. 

4.2.2  Determination  of  Sub-Population  Sizes 

Given  a population  stratification  there  exist  at  least  four  distinct 
ways  one  might  determine  the  sub-population  sizes  F^,  b=l,  ...B: 

(a)  direct  measurement;  (b)  estimation  from  a random  sample;  (c) 

solution  of  a set  of  linear  equation^  and  (d)  estimation  with  the 
choice  model.  These  four  approaches  are  described  in  more  detail. 
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a.  Direct  Measurement:  For  some  stratifications,  the  sub-population 

sizes  may  be  measured  directly.  Two  examples  will  suffice  to  illustrate 
this  approach.  First,  let  the  population  be  the  set  of  residents  of  an 

SMSA  and  consider  stratification  by  location  of  their  residence  within  the  SMSA. 

If  each  sub-population  contains  the  residents  of  an  integral  number  of 

Census  tracts.  Census  population  data  will  provide  relatively  accurate 

measure  of-  the  sub-population  sizes  at  given  points  in  time.  Second, 

let  the  population  be  the  set  of  individuals  making  work  trips  within 

the  SMSA  and  consider  a stratification  by  mode.  Rush  hour  transit  fare 

and  highway  cordon  counts  might  then  provide  adequate  measures  of  transit 

and  auto  usage  on  work  trips.  (Such  counts  cannot  be  perfect  measures 

as  some  trips  made  during  rush  hour  are  not  work  trips,  and  some  work 

trips  are  made  at  times  other  than  rush  hour.) 

b.  Estimation  From  a Random  Sample:  If  one  draws  a random  sample  of 

decision  makers  from  T,  then  the  sample  distribution  of  (i,z)  pairs 
is  a consistent  estimator  of  the  population  distribution  f(i,z).  It 
follows  that  for  any  stratification  (C  x Z)^,  b=l,...B,  the  fraction  of  the 
random  sample  who  belong  to  each  stratum  is  a consistent  estimate  of  F . Thus, 
given  any  stratified  rule,  the  associated  F values  can  be  estimated  if  an 
auxiliary  random  sample  is  available  or  can  be  drawn. 

One  important  issue  that  must  be  highlighted  is  that,  the  cost  of  a ran- 
dom survey  designed  solely  to  determine  values  of  , b=l,...B  should  not  be 
compared  with  the  costs  for  random  surveys  to  determine  an  empirical  attri- 
bute distribution.  The  former  only  requires  information  from  each  respondent 
sufficient  to  identify  the  stratum  to  which  he/she  belongs.  This  typically 
will  consist  of  a small  set  of  socioeconomic  characteristics  and/or  the  actual 
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choice  ieC  made.  The  latter  survey  requires  the  full  set  of  attributes,  typi- 
cally including  level  of  service  for  each  alternative  in  the  choice  set. 

c.  Solution  from  Set  of  Linear  Equations:  Recall  the  identity 

B 

(8)  Y(z)  = l 'F ( z | b ) • F 
b=l  ° 

introduced  earlier  in  this  section.  Let  x(z)  be  any  vector  valued  function 
of  z.  Then,  letting  E designate  the  expectation  operation,  it  follows  from 
(8)  that 

B 

(9)  E(x)  = l E (x  | b)  • F . 
b=l 

Imagine  that  the  values  E(x)  and  E(x|b),  b=l,  ...B  were  known. 

B 

Then  the  vector  equation  (9)  plus  the  identity  S F,  = 1 would 

b=l 

form  a set  of  linear  equations  in  the  unknown  parameters 

F,  , b=l , ...B.  In  particular,  if  the  x vector  has  at  least 
b 

| C | — 1 (where  |c|  denotes  the  number  of  alternatives  in  C)  compon- 
ents, then,  given  usual  linear  independence  conditions,  this  set  of 

equations  could  be  uniquely  solved  for  the  F,  values. 

b 

Observe  now  that  if  a stratified  sample  is  drawn,  the 
sample  mean  of  x among  those  decision-makers  belonging  to  T^ 
is  a consistent  estimate  for  E(x|b).  As  for  the  population  mean 
E(x),  these  values  are,  for  many  x functions,  available  from 
published  sources.  For  example,  if  the  population  is  the  set  of 
residents  of  an  SMSA,  often  Census  tables  will  provide  the  mean 
of  income,  age,  education  and  similar  socio-economic  and  demo- 
graphic variables.  If  the  population  is  the  set  of  automobile 
owners  in  a state,  statewide  registration  figures  may  provide 
mean  vehicle  age,  type,  etc.  Clearly,  the  key  to  determining  F 
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by  solving  equations  of  the  form  (9),  is  to  search  out  functions 
x(z)  for  which  published  population  means  are  available.  If  one 
is  imaginative,  this  search  will  often  be  successful. 

As  an  example,  return  to  the  simple  three  mode  case,  and 
suppose  one  knew  from  Census  data  that  the  average  income  and  auto 
ownership  in  the  population  were  $11,300  and  .94  respectively. 
Suppose  further  that  one  had  an  on-board  survey  and  roadside  inter- 
view that  provided  the  following  estimated  expected  values: 


Average 


Mode 

Average  Income 

Auto  Ownership 

1. 

Carpool  Users 

$10,000 

.8 

2. 

Drive  Alone  Users 

17,000 

1.4 

3. 

Transit  Users 

6,000 

.6 

In  this  case,  the  three  modal  user  groups  would  be  the  relevant 
strata  (corresponding  in  this  case  to  a choice-based  stratifica- 
tion). The  equations  implied  by  (9)  would  be: 

11,300  = 10,000  F + 17,000  F2  + 6,000  F3 

.94  = .8  F,  + 1.4  F„  + .6  Fn 


1 = F + F + F 
1 2 3 


The  resulting  solution  implied  by  this  would  be  F^  = .50,  F^  = .30, 
and  F3  = .20. 

It  should  be  noted  that  the  above  procedure  for  determining 
F can  be  implemented  using  population  medians  rather  than  means. 

We  indicate  here  only  the  simplest  case.  Let  z be  a scalar, 
let  x(z)  = z,  and  let  zm  be  the  population  median  of  z,  assumed 
known  from  published  sources.  Then  it  follows  from  (8)  that 
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(10)  ~ = Y(z  ) = l T(z  |b)-F 

2 m , u . m b 

b=  1 

For  each  b=l,...B,  the  quantity  ¥(zm|b)  is  consistently  estimated  by  the 

fraction  of  sampled  decision  makers  belonging  to  whose  z value  lies  below 

z . Using  this  estimate  in  (10),  a linear  equation  in  F results, 
in 

We  must  also  point  out  that  when  one  attempts  to  solve  for 
F using  eq.  (9),  the  replacement  of  E(x|b)  by  a sample  estimate 
implies  that  the  equations  to  be  solved  are  no  longer  exact. 

It  is  also  likely  that  more  than  the  minimal  set  of  | C ) — 1 estimated  expected 
values  will  be  available,  in  which  case  the  problem  of  solving  (9) 
becomes  one  of  finding  some  "best  fit"  values  of  F^  according  to  some 
criterion  (e.g.  least  squares).  For  both  of  these  reasons,  the  values  of 
F that  emerge  as  solutions  will  themselves  only  be  estimates  of  the  true 
F values.  This  comment  of  course  continues  to  apply  if  eq . (10)  is  used 
rather  than  eq.  (9). 

Finally,  we  note  that  the  procedure  of  solving  for  F de- 
scribed here,  differs  in  an  important  way  from  the  direct  measurement 
and  random  sample  estimation  methods  described  earlier.  That 
is,  the  present  procedure  determines  F only  after  the  stratified 
sample  has  been  drawn, while  the  others  do  so  prior  to  the  drawing. 

This  fact  represents  a drawback  to  the  present  procedure  because 
given  a stratification,  prior  knowledge  of  F can  be  useful  in 
selecting  the  sample  composition  H^,  b=l,  ,..B.  Why  this  is  so 
will  be  made  clear  in  Section  4.2.3. 


d.  Estimation  with  the  Choice  Model;  Given  a stratified  sample  and 
having  specified  a parametric  form  for  the  choice  probabilities 
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P (i | z , 0*)  , it  is  usually  theoretically  possible  to  jointly  estimate  the  para- 
meter vector  0*  and  the  population  fractions  F.^  This  approach  is  like  (c) 
above  in  that  it  determines  F only  after  a sample  has  been  drawn.  It  differs 
from  (a) , (b) , and  (c),  in  that  consistency  of  the  F estimates  obtained  here 
depends  on  the  correctness  of  the  parametric  model  assumed  for  the  choice 
probabilities.  In  contrast,  the  earlier  approaches  make  no  use  of  the 
choice  probabilities  whatsoever. 

4.2.3.  Selection  Among  Alternative  Designs 

Among  the  class  of  all  possible  stratifications  of  a popu- 
lation, some  will  not  be  implementable  because  relevant  sub-popula- 
tions can  not  be  isolated  or  sampled  from  at  random.  Others  will 
not  be  useable  for  estimation  of  the  attribute  distribution 
because  the  sub-population  sizes  F cannot  be  determined.  Still, 
in  most  applied  contexts  there  are  likely  to  exist  many  feasible 
stratifications,  and  for  each  of  these  a set  of  alternative  pos- 
sible sample  compositions.  How  then  should  the  analyst  select  among 
these? 

Unfortunately,  relatively  little  guidance  can  presently  be 
given.  The  choice  among  sample  designs  depend  of  course  both  on 
the  relative  costs  and  quality  of  the  approximations 
to  p associated  with  different  designs.  Sampling  costs  can  only 
be  determined  on  a case  by  case  basis,  so  let  us  concentrate  on  the 
quality  of  approximation  issue. 

^See  Manski  and  McFadden ( 1977)  for  details.  Exceptions  to  the  result 
are  that  F cannot  be  estimated  in  this  way  if  the  sampling  is  exogenous 
or  if  the  choice  model  has  the  conditional  logit  form  with  alternative 
specific  constants. 
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Consider  again  the  identity 
B 

(8)  Y(z)  5 l Y(z|b)-F 
b=l 

which  forms  the  basis  for  stratified  sampling  estimation  of  p.  On  the  basis 
of  (8) , some  useful  heuristic  statements  about  the  accuracy  of  alternative 
designs  can  be  made. 

First  observe  that  given  a sub-sample  size  N^,  the  empirical  distri- 
bution of  z values  among  the  N observations  approximates  ^ ( z | b ) best  when 

b 

this  conditional  density  has  all  its  mass  concentrated  on  a single  z point — 

that  is  when  the  sub-population  T,  is  homogenous  in  z.  This  suggests  that 

b 

a good  stratification  is  one  that  separates  the  population  into  groups 
which  are  relatively  homogenous  in  z . In  general,  internal  homogene- 
ity of  the  sub-populations  can  be  enhanced  by  increasing  their  number,  that 
is  by  more  finely  partitioning  C x Z.  Note,  however,  that  given  a fixed 
total  sample  size,  the  larger  B is,  the  fewer  observations  that  can  be  drawn 
from  each  sub-population;  hence,  the  less  accurate  is  each  empirical  z dis- 
tribution as  an  estimate  of  the  sub-population  distribution.  Thus,  in  select- 
ing among  stratifications,  there  is  a tension  between  the  desire  for  internal 
homogeneity  of  each  sub-population  and  the  need  for  an  adequate  number  of 
observations  per  sub-population. 

Assume  now  that  a stratification  has  some  how  been  selected.  The  analyst 
must  then  select  a sample  composition.  Holding  the  total  sample  size  fixed, 
two  directives  for  choosing  a composition  can  be  given.  First,  observe  that 
the  influence  of  each  conditional  distribution  ( z | b ) on  the  population  dis- 
tribution 'f(z)  increases  directly  with  the  value  of  F^ . This  suggests  that. 


31 


all  else  equal,  the  larger  is,  the  larger  the  sample  fraction  should 

be.  Second,  recall  that  the  more  homogeneous  is  in  z,  the  fewer  obser- 

vations are  needed  to  achieve  any  accuracy  in  estimating  'F ( z | b ) • This  sug- 
gests that  all  else  equal,  the  more  homogeneous  T is,  the  smaller  H should 

b b 

be . 

Finally,  consider  the  choice  of  a sample  size  holding  the  sample  com- 
position fixed.  It  is  easy  to  show  that  if  N observations  are  drawn  from 

b 

A 

sub-population  , the  empirical  distribution  'F (z | b ) is,  when  evaluated  at 

any  value  z , a binomial  random  variable  with  mean  'F ( z ) and  standard  error 
J o o 

u 

(H'Cz^)  ( l-TCz^)  ) / • Thus,  increasing  accuracy  may  always  be  obtained 

by  taking  larger  samples  but  as  measured  by  the  standard  error,  accuracy 
increases  only  as  — . No  general  guidance  can  be  offered  as  to  how  large 

a sample  is  "large  enough."  This  question  must  be  dealt  with  on  a case  by 
case  basis. 

The  reader  familiar  with  classical  sampling  theory  will  recognize  that 

the  above  heuristic  statements  are  extrapolations  of  well  known  results  on 

optimal  (i.e.,  minimum  variance)  sampling  for  estimation  of  a population  mean. 

Here,  interest  is  in  estimating  the  entire  density  p(z),  not  simply  the  mean 

E(z).  Hence,  the  classical  results  offer  guidance  but  do  not  apply  directly. 

There  is  one  further  respect,  ignored  in  the  above  discussion,  in  which 

the  present  sampling  problem  may  differ  from  the  classical  one.  That  is,  in 

practice  we  may  only  have  estimates  of  the  sub-population  sizes  F,  , not  the 

b 

true  values.  In  contrast,  the  classical  literature  always  assumes  that  the 
true  values  are  available.  The  consequences  of  using  F estimates  in  strati- 
fied sampling  estimation  of  p(z)  have  not  yet  been  explored. 
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5.  SAMPLE  DESIGN  FOR  CHOICE  MODEL  ESTIMATION 


We  now  turn  to  the  problem  of  learning  the  choice  probabilities  P(i|z). 

As  mentioned  earlier,  the  literature  on  discrete  choice  analysis  universally 

assumes  that  these  probabilities  are  a priori  specified  up  to  the  value  of 

a finite  parameter  vector.  As  before, we  will  let  0* *  designate  the  true 

value  of  this  vector  and  let  the  choice  probabilities  be  written  as 
* 

P(i|z,0  ).  It  is  furthermore  generally  assumed  that  the  analyst  has  no  prior 
* 

knowledge  of  0 . Section  5.1  summarizes  the  theoretical  literature 
on  the  estimation  of  0*  given  a sample  of  (i , z)  observations  and  on  the  de- 
sign of  samples  to  be  used  in  such  estimation.  Section  5.2  then  assesses  the 
practical  implications  of  the  existing  theory  for  travel  demand  analysis. 

5.1  THEORETICAL  RESULTS 

We  first  briefly  review  the  quite  comprehensive  literature  on  choice 
model  estimation  given  a sample  design.  We  then  present  the  contrastingly 
small  set  of  available  results  relevant  to  the  problem  of  selecting  among 
alternative  designs. 

5.1.1.  Estimation  Given  a Sampling  Rule 

From  the  perspective  of  sample  design,  the  most  important  proven  theo- 
retical result  on  choice  model  estimation  is  certainly  the  following:  Subj ect 

to  certain  technical  conditions,  any  stratified  sampling  rule  provides  a 

* 

basis  for  consistent  estimation  of  9 . 

The  above  finding  emerges  from  the  intensive  investigation  of  maximum 

* 

likelihood  and  related  methods  of  estimation  of  9 made  by  Manski  and  McFadden 
(1977),  and  extended  by  Cosslett  (1977).  Maximum  likelihood  estimation  rests, 
of  course,  on  a re-interpretation  of  the  sample  likelihood  as  a function  of 
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those  of  its  determinants  which  are  unknown  to  the  analyst.  Consider  then 
the  stratified  sampling  likelihood  originally  given  in  equation  (2),  and 
rewritten  here  as 


(2') 


L 


t>  N P(i  \z  ,9  ) p(z  ) 

B b v n1  n*  r n 
n tt  

b_1  n_1  ”J(CxZ)  p(j |y»e  )p(y)d(j ,y) 


If  the  attribute  density  p(z)  is  a priori  known,  a maximum  likelihood  esti- 

* * 
mate  for  9 is  obtained  by  treating  L as  a function  of  0 , and  finding  the 

maximum  of  this  function.  Less  transparently,  if  p(z)  is  unknown,  the  like- 

* 

lihood  may  be  treated  as  a function  of  the  pair  of  unknowns  9 and  p,  and 

* 

maximized  jointly  over  all  possible  0 values  and  all  possible  attribute  den- 
sities p.  The  above  maximizations  may  be  carried  out  in  an  unconstrained 
manner.  If  the  sub-population  sizes  F^,  b = 1,...B  are  known,  the  constraints 

Fb  = ^(C  x Z)  |y*e  )p(y)d(j ,y) ,b  = 1 , . . . B can  be  imposed, 
b 

It  is  shown  in  Manski  and  McFadden  (1977),  and  Cosslett  (1977),  that  sub- 
ject to  technical  conditions,  all  of  the  above  variants  of  the  maximum  like- 
lihood method  yield  consistent  estimates  for  0*  whatever  the  stratified 
sampling  rule  used  to  generate  the  data.  Among  the  required  technical  con- 
ditions, there  are  two  of  practical  importance,  one  a condition  on  the  prob- 
ability model  and  the  other  a restriction  on  the  sampling  process. 

* 

The  probability  model  condition  is  that  0 be  identified  in  the  popula- 
tion. Roughly,  this  means  that  there  must  exist  no  vector  of  parameters  other 

* * 
than  0 which  yield  exactly  the  same  choice  probabilities  as  0 for  all  (i,z) 

pairs.  Clearly,  if  there  did  exist  some  0 always  yielding  the  same  choice 
* 

probabilities  as  9 , the  sample  likelihood  would  always  be  identical  when 

* 

evaluated  at  0 and  0 . Hence,  identification  in  the  population  is  a necessary 
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k 

condition  for  consistent  estimation  of  0 by  any  estimation  method,  and  any 
1 

sampling  rule. 

k 

Assume  now  that  0 is  identified  in  the  population.  Then  the  sampling 

* .... 

process  condition  is  that  0 also  be  identified  in  the  sample.  That  is,  for 

* 

a given  stratification  and  sample  composition,  there  must  not  exist  a 0^0 

such  that,  for  all  possible  samples,  the  likelihood  (2')  is  identical  when 
k J k 

evaluated  at  0 and  at  0.  In  practice,  if  0 is  identified  in  the  population, 

it  will  generally  also  be  identified  in  the  sample  as  the  latter  condition 

can  fail  only  in  atypical  sampling  processes. 

Maximum  likelihood  estimators  are,  of  course,  not  the  only  methods 

* 

available  for  estimation  of  0 . The  reader  interested  in  alternative  ap- 
proaches is  referred  to  Amemiya  (1975),  Manski  (1975),  and  Manski  and  Lerman 
(1977),  for  presentation  of  some  alternative  methods  suitable  in  certain  con- 
texts. Because  of  their  generality  of  application  and  their  classical  asymp- 
totic efficiency  properties,  however,  maximum  likelihood  estimators  do  occupy 
a special  place  in  the  literature.  This  special  place  will  be  evident  when 
we  next  discuss  selection  among  sample  designs  , since  the  available  theoreti- 
cal results  relevant  to  this  question  all  presume  that  maximum  likelihood 
methods  will  be  used  in  estimation. 

5.1.2.  Selection  Among  Sample  Designs 

We  have  earlier  stated  that  there  exists  only  a small  set  of  theoretical 

results  relevant  to  the  problem  of  selecting  a sample  design  for  choice  model 

estimation.  Before  describing  these  results,  it  will  be  useful  to  explain 

why  significant  findings  in  this  area  have  been  so  difficult  to  achieve. 

^Failure  of  this  condition  constitutes  an  important  motivation  for  experimen- 
tation, See  Section  6 for  a discussion. 

2 

A Identification  in  the  sample  does  not  preclude  the  possibility  that  in  some 
realizations  of  the  sampling  process,  a unique  maximum  likelihood  estimate  for 
0*  will  not  exist. 
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The  traditional  statistical  measure  of  the  precision  of  an  (asymptoti- 
cally) unbiased  estimator  is  the  (asymptotic)  variance  matrix  of  its  para- 
meter estimates.  Maximum  likelihood  estimates  of  the  choice  model  parameters 
0 are,  in  general,  asymptotically  unbiased.  Moreover,  holding  the  sample 

design  and  one's  prior  knowledge  of  p and  F fixed,  maximum  likelihood  esti- 
* 

mates  for  0 are  asymptotically  efficient,  in  the  sense  that  the  asymptotic 
variance  matrix  of  any  other  asymptotically  unbiased  estimator  must  exceed 
that  of  maximum  likelihood  by  a positive  semi-definite  matrix.  From  the 
above  facts,  a natural  strategy  for  statistically  comparing  alternative  sam- 
ple designs  emerges.  That  is,  for  given  sample  size  and  informational  con- 
ditions, examine  how  the  maximum  likelihood  asymptotic  variance  matrix 
changes  as  a function  of  the  sampling  rule  used.^  A statistically  "good" 
sampling  rule  can  then  be  defined  as  one  yielding  a "small"  variance  matrix, 
where  smallness  of  the  matrix  can  be  measured  by  its  trace,  largest  eigenvalue 
or  some  other  statistic. 

The  asymptotic  variance  matrices  of  maximum  likelihood  estimates  for 

* 

0 obtained  under  alternative  exogenous  and  choice  based  sampling  designs* 

2 

and  various  informational  conditions  are  given  in  Manski  and  McFadden ( 1977) . 
Inspection  of  these  matrices  reveals  the  following: 

(i)  For  given  prior  information  on  p and  F used  in  estimation,  the  rela- 
tive precision  of  alternative  sampling  rules  depends  on  the  unknown  value 
of  0*.  In  particular,  no  design  is  uniformly  best  across  the  possible 
values  of  0*.  This  implies  that  the  optimal  sampling  rule  will  depend  on 
the  true  value  of  0*,  which,  if  we  knew  in  the  first  place,  would  obviate 
the  need  for  sampling  altogether. 


1 


We  note  that  under  all  stratified  sampling  rules,  the  asymptotic  standard  error 


of  0*  estimates  decrease  with  sample  size  at  the  rate  */ 


✓ N 


while  these  authors  do  not  present  variance  matrices  for  stratified  designs 
other  than  choice  based  and  exogenous  ones,  the  findings  reported  below  extend 
to  such  designs  directly. 


36 


(ii)  For  a given  value  of  0 , the  relative  precision  associated  with  al- 
ternative sampling  rules  depends  on  what  prior  information  on  p(z)  and 

F is  used  in  estimation.  In  particular,  knowledge  of  either  the  distri- 
bution of  attributes,  or  the  shares  of  the  population  in  each  strata  will 
decrease  the  variance  of  the  parameter  estimates , except  that  knowledge 
of  p alone  is  valueless  if  exogenous  sampling  is  used. 

JL 

(iii)  For  given  0 and  p-F  information,  the  maximum  likelihood  asymp- 
totic variance  matrix  is,  except  in  some  special  cases,  an  analytically 
complicated  function  of  the  sampling  rule.  Hence,  a ranking  of  rules 
by  precision  is  usually  difficult  to  achieve. 

These  three  facts  constitute  the  source  of  the  problems  researchers  at- 
tempting to  statistically  compare  alternative  sample  designs  have  faced.  It 
should  be  noted  that  these  problems  are  not  peculiar  to  choice  model  estima- 
tion. In  fact,  they  apparently  arise  in  all  non-linear  modelling  contexts. 

The  reader  familiar  with  the  strong  findings  of  classical  sampling  theory  and 

expecting  similar  results  here  should  recall  that  the  classical  theory  pre- 

2 

sumes  the  relatively  simple  linear  model  Y = x8  + £ , E(e|x)  = 0,  V(e)  ~ o G 

2 

with  G known  and  8 and  a to  be  estimated.  The  classical  results  do  not  apply 
outside  this  model,  for  example,  even  if  the  only  change  is  to  make  G unknown. ^ 
With  the  above  as  prelude,  the  theoretical  results  on  sample  design  that 
have  been  achieved  are  now  described.  We  first  present  the  available  analy- 
tical results,  then  the  findings  of  some  Monte  Carlo  experiments. 
a.  Analytical  Results:  The  primary  available  analytical  results  concern 

the  relative  estimation  precision  obtained  under  alternative  exogenous  sampling 
designs  when  neither  p nor  F is  known,  and  when  the  choice  probabilities  have 
the  form 

JL.  JL.  -A.  JL. 

(11)  P(i  !z,e  ')  = D(  (zt  - z )cp  + (Y.-Y.),  j € C) 

1 

The  classical  results  for  the  linear  model  are  given  in  Rao  (1973). 
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* * * 

where  0 = (<j>,  y , j£C)  and  where  the  function  D is  strictly  increasing  in 

* 

each  of  its  arguments.  In  this  case,  each  Y ^ , jeC  is  an  alternative  speci- 
fic constant,  or  dummy  variable.  All  random  utility  models  for  which  the 

* * 

utilities  U.  are  defined  to  be  U.  = z.d)  + y.  + £ . , where  (£.,  i£C)  are 
1 1111  l 

independent  and  identically  distributed  disturbances,  yield  choice  probabili- 
ties satisfying  (11).  In  particular,  the  conditional  logit  model  is  a 
member  of  this  class. 

Given  the  above  assumptions,  Manski  (1978),  investigates  the  maximum 

& 

likelihood  asymptotic  variance  matrix  for  0 estimates  under  the  hypothesis 
* 


<j)  =0.  It  turns  out  that  under  this  hypothesis,  this  usually  complex  matrix 

takes  on  a relatively  simple  form.  In  particular,  if  the  choice  set  contains 

* 

two  alternatives  or  if  y.  = 0,  all  j£C,  then  the  asymptotic  variance  matrix 
-1  1 

becomes  ^ where  a >0,  N is  the  sample  size,  I is  the  |c|  - 1 

ieC  j 

expected  sampling  variance  in  attributes  across  alternatives  in  the  choice  set. 

It  follows  from  the  above  that,  given  the  assumptions  imposed,  a good 

exogenous  sampling  design  is  one  in  which  the  decision-makers  drawn  face  widely 

disparate  alternatives  within  their  choice  sets,  in  the  sense  that  the  expected 

£ £ (z.  - z.)'(z.  - z.)  is  "large".  This  is  a rather  intuitive  result, 

i£Cj£C  1 J 1 J 

particularly  as  it  is  analogous  to  the  classical  result  for  the  standard 
linear  model.  Since  the  assumptions  made  in  the  present  case  are  so  stringent, 
however,  the  practical  usefulness  of  the  result  should  be  questioned. 

Actually,  the  work  described  here  does  have  one  important  application. 

In  practice,  one  sometimes  is  interested  in  determining  whether  a given  choice 
model  is  informative,  in  the  sense  that  it  non-trivially  explains  some  part  of 
choice  behavior.  In  the  context  of  models  of  the  form  (11),  a specification 


dimensional  identity  matrix,  and  V(z)  = E(  £ 


J (z.  - z.)  (z.  - z.)  ) is  the 
l J l j 

£C  J 
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is  informative  if  and  only  if  4>  J 0.  Given  this,  one  may  wish  to  design  a 

-k 

sample  suitable  for  the  purpose  of  testing  against  the  null  hypothesis  (p  JO. 
For  this  purpose,  the  Manski  (1978)  results  describe  the  characteristics  of 
a good  sample  design. 

Having  offered  a legitimate  application,  we  now  give  a word  of  caution. 

It  is  tempting  to  extrapolate  from  the  special  case  described  here  and  con- 
clude that  in  exogenous  sampling  a large  attribute  variance  among  the  alter- 
natives in  a choice  set  is  always  desirable.  This,  however,  is  not  the  case. 
In  particular,  consider  the  simple  one  parameter  binary  choice  logit 

model 


The  asymptotic  variance  of  the  exogenous  sampling  maximum  likelihood  estimate 

a. 

2 


x 1 i i 

for  0 can  then  be  shown  to  be  — ( E — “L 


0 z.  . 
e ij 


-1 


Inspection  of  the 


( 1 + e6  Zi j)2 


operand  in  the  above  expression  reveals  that  if  0 JO,  then 

a- 

2 


lim 

I z . . I-*  <*> 

1 iJ  1 


9 z . . 
z . . • e l j 

JJ , 


(1  + e ij) 


— j-  2 - 0 from  which  it  follows  that  the  value  of  |z„ 


minimizing  the  variance  of  the  0 estimate  is  finite.  On  the  other  hand,  if 

£ 

0 =0,  then  in  accord  with  this  discussion  of  this  section,  we  find  that 

„ 0 z . 


z.  . . e 

_JJ 


ij 


2 

z . . 
ij 


© z. 

(1  + e 1J)2 


implying  that  the  variance  of  the  0 estimate  is  an  everywhere  decreasing 

function  of  I z . . . 

1 iJ  1 

To  close  out  this  discussion,  we  call  attention  to  a simple  result  on 
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choice  based  sampling  which  highlights  the  role  of  prior  information  on  p (z) 
and  F in  determining  the  relative  efficiency  associated  with  alternative  de- 
signs. It  can  easily  be  shown  that  if  p(z)  is  known,  then  the  best  choice 
based  sampling  design  is  one  in  which  all  observations  are  drawn  from  a 
single  sub-population  T..  That  is,  the  best  design  has  all  observations 

drawn  from  users  of  a single  alternative.  (Which  alternative  it  is  best  to 

* 

draw  from  depends  on  the  value  of  0 , however.)  On  the  other  hand,  if  p(z) 
is  not  known,  then  the  best  design  must  satisfy  the  condition  > 0,  all 

j£C.  In  fact,  a design  not  meeting  this  requirement  will  often  not  even  suf- 

* 1 
fice  to  identify  0 in  the  sample. 


b.  Monte  Carlo  Findings:  Cosslett  (1978b)  has  been  using  Monte  Carlo  experi- 

ments to  study  the  relative  estimation  precision  associated  with  alternative 
choice  based  sampling  designs  in  the  context  of  single  parameter  binary  choice 
models.  If  the  choice  set  contains  the  two  alternatives  (i,j)  and  sampling 
is  choice  based,  the  analyst's  control  variable  for  sample  design  is  the 
sample  fraction  H..  Assuming  that  the  choice  probabilities  have  the  logit, 

probit,  or  arctan  form  and  that  p(z)  is  unknown,  Cosslett  examines  how  the 

* 

asymptotic  variance  of  0 changes  as  a function  of  H.  and  of  one  s prior 
knowledge  of  F^ . While  Cosslett 's  work  is  still  ongoing  and  while  Monte  Carlo 
findings  cannot  be  conclusive,  two  interesting  tentative  findings  can  be  cited. 

First,  it  appears  that  when  F is  not  known,  good  designs  are  ones  which 
place  close  to  — . This  conclusion  is  quite  strong  in  the  logit  and  probit 
models,  less  so  in  the  arctan  one.  On  the  other  hand,  when  F is  known,  it 
appears  optimal  to  oversample  the  rare  alternative,  that  is  to  set  H.  > y if 


The  first  result  occurs  because  in  the  case  of  p known,  the  information  matrix 
(i.e.,  the  inverse  of  the  variance  matrix  of  the  estimates)  is  linear  in  the 
values.  On  the  other  hand,  when  p is  unknown,  this  matrix  turns  out  to  be  linear 
in  _1_ 

H, 
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Second,  the  usefulness  of  knowledge  of  F seems  evident.  Holding  the 

* 

sample  design  fixed,  such  knowledge  reduces  the  variance  of  the  9 estimates 
substantially  in  Cosslett's  experiments. 

5.2  PRACTICAL  CONCERNS 

We  have  seen  that  the  existing  theoretical  literature  on  choice  model 
estimation  is  very  successful  in  offering  the  analyst  methods  for  estimating 
6 . The  literature  is,  contrariwise,  very  weak  in  providing  guidance  on  how 
one  should  select  among  alternative  sample  designs.  Relatively  few  results 
are  available  and  it  appears  that  relatively  little  can  be  learned.  To  place 
the  literature  in  appropriate  applied  perspective,  the  basic  assumptions  of 
existing  theory  must  first  be  understood  and  interpreted. 

Two  assumptions  characterize  the  existing  estimation  theory.  First,  the 
analyst  is  presumed  able  to  a priori  specify  the  functional  form  of  the  choice 

model;  the  only  estimation  problem  is  associated  with  the  value  of  the  para- 

* 

meter  vector  0 , This  assumption  is  extremely  useful  because  the  data  require- 
ments for  parametric  analysis  are  considerably  smaller  thap  for  non-parametric 

* 

analysis.  Second,  one  is  presumed  to  have  no  prior  knowledge  of  0 whatsoever. 
This  assumption  is  standard  in  classical  statistical  analysis. 

From  the  perspective  of  applications,  the  foregoing  two  idealized  assump- 
tions stand  in  interesting  contrast.  On  the  one  hand,  it  is  undoubtedly  overly 
optimistic  to  suppose  that  in  practice  one  can  correctly  place  the  choice 
probabilities  in  a known  parametric  family.  Behavioral  theory  and  empirical 
observation  will  usually  let  one  put  some  structure  on  the  choice  probabilities. 


1 Although  it  should  be  noted  that  not  all  the  estimators  discussed  above  have 
been  programmed  in  available  econometric  software. 
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but  not  the  very  exacting  structure  implied  by  a parameterization.  On  the 

other  hand,  having  specified  a parametric  family,  it  will  often  be  too  pessi- 

* 

mistic  to  assert  that  the  analyst  is  totally  ignorant  about  0 . Loosely, 

* 

0 characterizes  tastes  and  we  usually  do  have  some  prior  knowledge  of  what 
people 's  tastes  are. 

With  the  above  as  background,  three  general  comments  relevant  to  sampling 
practice  can  be  offered. 

First,  in  designing  a sample  for  choice  model  estimation,  concern  with 

* 

estimability,  that  is  the  ability  of  the  design  to  support  estimation  of  0 
at  all,  should  dominate  worry  about  estimation  precision.  This  advice  is 
given  for  a very  simple  reason.  That  is,  estimability  is  a necessary  require- 
ment before  precision  can  even  become  an  issue. 

Second,  to  the  extent  that  one  does  become  concerned  with  the  relative 
precision  of  alternative  designs,  the  classical  statistical  framework  assumed 

in  the  existing  theoretical  literature  should  be  applied  sensibly  rather  than 

* 

dogmatically.  If  the  analyst  has  prior  information  about  the  value  of  0 , 
he  should  use  it  in  selecting  among  designs. ^ If  he  views  his  choice  model  as 
only  an  imperfect  approximation  of  reality,  he  should  recognize  that  theore- 
tical results  comparing  designs  can  themselves  hold  at  most  approximately. 

If  the  classical  measure  of  estimation  precision,  that  is  the  asymptotic  var- 
iance matrix,  differs  from  the  measure  he  feels  most  desirable,  he  should 
understand  that  a classical  ranking  of  designs  may  not  be  most  appropriate. 

Third,  it  should  be  understood  that  the  problems  of  describing  p and 
* 

estimating  0 , while  formally  distinct,  are  nevertheless  related  in  various 

ways.  For  one  thing,  prior  knowledge  of  F is  both  necessary  to  describe  p 

* 

and  useful  in  estimating  0 . For  another,  the  F values  can  in  theory 


A formal  framework  for  incorporating  such  prior  information  is  provided  by 
Bayesian  analysis.  See  Section  7 for  further  discussion. 
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be  estimated  along  with  0 and  then  used  in  describing  p.  (Such  joint  esti- 
* 

mation  of  F and  9 is  possible  except  when  the  sampling  is  exogenous  or  the 

choice  model  has  the  conditional  logit  form.  See  Manski  and  McFadden,  1977.) 

Perhaps  most  important,  the  same  data  sample  is  often  used  both  to  describe 
* 

p and  estimate  0 . When  such  dual  use  is  intended,  the  sample  design  selected 
must  be  suitable  for  both  objectives. 
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6.  DESIGN  OF  EXPERIMENTS 


Consider  a proposed  policy  whose  effect,  if  implemented,  would  be  to 
create  a travel  environment  which  differs  in  some  way  from  that  currently 
faced  by  decision  makers.  To  forecast  the  impact  of  such  a policy  on  travel 
behavior,  the  choice  probabilities  which  would  prevail  under  the  new  travel 
environment  must  of  course  be  known.  However,  in  the  absence  of  historical 
experience,  these  probabilities  may  not  be  a priori  known,  nor  estimable  from 
data  on  current  travel  choices. 

For  example,  suppose  one  were  interested  in  forecasting  how  the  intro- 
duction of  buses  with  wheelchair  lifts  into  an  area  without  such  buses  would 
influence  transit  usage.  In  a mode  choice  model,  any  parameters  relating  to 
the  desirability  of  buses  with  lifts  would  not  be  estimable  from 
pre-introduction  data.  In  such  contexts,  where  the  choice  probabilities 
cannot  be  inferred  from  existing  data,  it  may  be  useful  to  subject  a sub- 
set of  the  decision  making  population  to  the  policy  of  interest,  observe 
their  subsequent  behavior  and  then  infer  the  required  choice  probabilities 
from  these  observations.  That  is,  it  may  be  useful  to  perform  an  experiment. 

Within  the  framework  of  discrete  choice  analysis,  an  experiment,  like 
a permanent  policy  change,  can  be  viewed  formally  as  a function  y(z)  changing 
each  decision  maker's  present  attribute  value  into  some  new  value. 

By  combining  information  about  the  population's  behavior  under  the  pre- 
experiment attributes  z and  their  behavior  under  the  post-experiment  attributes 
y(z),  the  set  of  choice  probabilities  for  any  attribute  vector  in  the  ori- 
ginal range  of  attributes  ZQ  or  in  the  post-experiment  range  Zj  = (y(z),zeZo) 
can,  in  theory,  be  inferred. 
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It  is  natural  to  ask  how  one  should  design  an  experiment  so  that  it  will 
be  informative  regarding  the  consequences  of  policies  of  interest.  This  im- 
portant question  has  not  previously  been  addressed  in  the  discrete  choice 
literature.  To  begin  what  should  eventually  be  an  extensive  investigation, 
Section  6. 1 offers  a few  simple  theoretical  results  relevant  to  the  de- 
sign of  experiments.  Section  6.2  then  discusses  some  practical  concerns  that 
arise  in  experimentation. 

Before  proceeding,  we  should  point  out  that  experimentation  has  uses 
beyond  the  discrete  choice  applications  discussed  here.  In  particular,  experi- 
ments may  be  used  to  determine  what  consequences  a policy  change  would  actually 
have  for  the  attribute  distribution.  That  is,  when  the  function  y(z)  associated 
with  a policy  change  is  unknown,  experimentation  may  enable  one  to  learn  this 
function.  Many  of  the  experiments  carried  out  in  UMTA's  Service  and  Methods 
Demonstration  Program  have  this  objective.  Experiments  performed  for  such  pur- 
poses will  not  be  discussed  further  in  this  paper. 

6.1  THEORETICAL  RESULTS 

The  role  of  experiments  within  the  existing  theory  of  choice  model  esti- 
mation can  be  easily  described.  Assume  as  usual  that  the  choice  probabilities 
(P(i | z) , z£Z)  have  been  placed  in  a parametric  family  indexed  by  0 . Let  there 
exist  a proposed  policy  of  interest  which  would  map  Z^  onto  an  attribute  set 
Z^and  assume  that  the  choice  probabilities  (P (i  |z  ),z  eZ^ )can  themselves  be 
parametrized  by  9 . Consider  now  a situation  in  which  some  subset  0^  of  9 
is  not  identified  in  the  present  population  described  by  f(i,z)  = P(i|z,9  )p(z) 
but  is  identified  in  the  post  policy  population  described  by 

f(i,z)  = P(i|z  ,9  )p(z).  Hence,  forecasting  post  policy  behavior  requires 
* * 

that  0^  be  known*  but  0^  cannot  be  inferred  from  observations  of  present  choices. 
From  this,  the  objective  of  experimentation  emerges,  that  is  to  modify  the 


45 


present  attribute  distribution  in  such  a way  that  0^  is  identified  in  the  post- 
experiment population. 

The  above  discussion  is  quite  abstract.  Let  us  therefore  describe  some 

* 

simple  examples  of  applied  importance.  Let  z^  = (zoti>  ziti^’  ® 

* * * 

(^o’^l5  Y^»i£C)  and  assume  that  the  choice  probabilities  P(i|z)  are  derived 


& & 

from  the  random  utility  model  U . = Y.  + <J) 

tx  1 o 


z . + d),  • z.  . + e . where 
oti  Y1  lti  ti 


(e  . ieC)  has  some  given  probability  distribution.  Consideration  of  four 

1 1 j 

problems  within  this  context  will  serve  to  indicate  some  of  the  uses  of 
experiments. 

(i)  Assume  that  for  each  teT,  z1t.  = z,  . for  all  i,  ieC.  That  is, 

lti  ltj 

alternatives  are,  for  each  decision-maker,  completely  homogenous  along 

* 

the  z^  attribute.  Clearly,  4>^  is  not  then  identified.  If  we  wish  to 

forecast  behavior  under  a policy  which  makes  alternatives  heterogenous 

* 

along  the  z^  attribute,  knowledge  of  <f>  is  necessary.  For  example, 

local  regulatory  policy  tends  to  create  taxi  fares  which  are  uniform 

across  operators  in  a given  area.  In  a model  of  choice  of  which  taxi 

* 

company  people  call  for  service,  a coefficient  (corresponding  to  (J)^)  for 

* 

the  effect  of  taxi  fare  (corresponding  to  z ^ ) could  not  be  estimated. 


(ii)  Assume  that  for  all  s , teT  and  all  ieC,z  . = z ..  That  is,  the 

lti  1 si 

z^  attributes  are  constant  across  decision-makers.  In  this  case  the  com- 

* * * 
posite  parameters  (y^  + <|)^z  ieC  may  be  identified,  but  not  the  Y-j/s 

and  <J)  separately.  If  we  wish  to  forecast  behavior  under  a policy  which 

makes  the  z^  attributes  heterogenous  across  decision-makers  or  one  which 

* * 

simply  changes  the  z^  values  uniformly,  the  y^  and  <|)  parameters  must  be 
known.  An  example  of  this  is  if  z^t  were  a dummy  variable  in  a mode 
choice  model  which  indicated  whether  or  not  a mode  was  characterized  as 
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demand  responsive.  Since  most  urban  areas  do  not  offer  demand  respon- 
sive service,  and  the  auto  mode  by  its  very  nature  is  demand  responsive, 
it  would  be  impossible  to  distinguish  the  effect  on  utility  of  an  auto- 
mobile constant  from  that  of  a dummy  variable  describing  whether  or  not 
a mode  was  demand  responsive.  Only  a composite  automobile /demand  re- 
spQnsive  constant  could  be  estimated. 

iii)  Assume  that  for  each  teT  and  ieC,  ZQt±  = aziti  + ^ f or  some  a and  ^ 

That  is,  the  z and  z,  attributes  are  perfectly  correlated.  Now 
o 1 

^ ft 

<()*a  + 4.*  may  be  identified  but  not  4>q  and  ^ separately.  If  we  wish  to 
forecast  behavior  under  a policy  which  makes  and  z^  less  than  per- 
fectly correlated  or  one  which  simply  changes  the  a constant,  the  values 
<J * and  <J)*  must  somehow  be  learned.  One  example  of  this  might  occur  in 
small  cities  with  metered  taxis  and  zonal  bus  fares.  In  such  cases,  it 
is  conceivable  that  fares  by  taxi  and  bus  would  be  perfectly  correlated 
with  in-vehicle  travel  times;  car  operating  costs  and  times  would  be 
similarly  correlated.  In  this  case,  it  would  be  impossible  to  estimate 
separate  time  and  cost  effects;  only  a composite  coefficient  for  the 
sum  of  cost  and  in-vehicle  time  could  be  estimated. 


iv)  Assume  a choice  keC  is  entirely  unavailable  to  a population,  and 

P(k  z)  would  always  therefore  be  zero.  In  this  case,  if  z . = z for 
1 lti  Itj 

* * 

all  i, jeC,  i,  jfk,  then  and  y,  would  not  be  estimable.  For  example, 

1 K. 

some  small  cities  may  not  have  any  transit  service.  In  this  case,  the 
parameters  of  all  the  transit  specific  variables  would  not  be  estimable, 
including  the  transit  constant.  Some  parameters,  such  as  that  for  generic 
travel  time,  however,  could  be  estimated  from  existing  choices  among  car- 
pooling, driving  alone  and  taxi. 
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Appropriately  designed  experiments  can  solve  each  of  the  above  problems. 
For  example,  one  might  in  each  case  subject  a subset  of  the  decision  making 
population  to  a scaled  down  version  of  the  intended  policy.  However,  such  a 
strong  correspondence  between  experiment  and  policy  is  not  necessary.  Any 
modification  of  the  present  attribute  distribution  which  renders  the  relevant 
parameters  identified  will  suffice. 

Return  now  to  our  abstract  discussion  of  experimental  design.  Given  a 

•k 

set  of  experiments  each  of  which  can  identify  0^,  it  remains  to  ask  how  one 
should  select  among  this  set.  This  question  has  a very  simple  formal  inter- 
pretation. Each  experiment  one  can  conduct  produces  some  new  attribute  dis- 
tribution. Given  this,  the  problem  of  selecting  a good  experiment  becomes 
one  of  selecting  a good  attribute  distribution.  The  latter  problem  is  ob- 
viously closely  related  to  the  sample  design  problem  treated  in  Section  V 
but  has  not  itself  been  investigated  thus  far. 

6.2  PRACTICAL  CONCERNS 

In  most  respects,  the  practical  concerns  that  arise  in  designing  an  ex- 
periment are  analogous  to  those  that  arise  in  designing  a sample  for  choice 
model  estimation.  The  experimental  design  problem  does,  however,  have  one 
aspect  that  does  not  appear  when  sampling.  That  is,  a duration  for  the  ex- 
periment must  be  selected. 

If  it  could  be  shown  that  people  react  quickly  to  changes  in  their  travel 
environments,  the  duration  question  would  be  of  little  consequence.  However, 
there  exists  ample  anecdotal  evidence  that  adjustments  to  new  conditions  occur 
only  slowly.  Moreover,  the  adjustments  people  make  when  they  believe  a change 
is  temporary  may  differ  from  those  they  make  when  they  think  one  is  permanent. 

Existing  discrete  choice  theory,  being  static,  obviously  can  offer  no 
guidance  on  the  selection  of  an  experiment's  duration.  The  nascent  dynamic 
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choice  theory  cited  in  Section  1.  could  potentially  provide  such  guidance  but 
will  require  considerable  development  before  it  becomes  useful  in  practice. 

A second  practical  issue  arises  when  none  of  the  four  theoretical  con- 
ditions for  non-identification  discussed  in  Section  6.1  holds  entirely,  but 
one  or  more  is  nearly  true.  That  is,  while  it  may  be  theoretically  possible 
to  estimate  all  the  parameters  of  interest,  some  estimates  may  be  extremely 
unreliable.  In  such  situations,  it  might  prove  useful  to  conduct  an  experi- 
ment which  increases  the  range  of  an  attribute  or  makes  a particular  alter- 
native available  to  more  decision  makers. 

For  example,  even  in  a fixed  fare  system,  there  may  be  some  variation 
in  transit  fares  due  to  people  having  to  transfer  to  make  certain  trips 
(assuming,  of  course,  that  transfers  are  not  free).  However,  while  this 
variation  in  transit  fare  may  theoretically  identify  a parameter  for  tran- 
sit fare  in  a mode  choice  model,  the  reliability  of  the  parameter  estimate 
may  be  extremely  low.  In  this  case,  a zone  fare  experiment  on  selected 
routes  would  be  a useful  way  to  improve  predictions  of  more  extensive  fare 
policy  changes. 

To  conclude  this  section,  we  should  comment  on  the  substantial  differ- 
ence between  the  use  we  have  advocated  for  experiments  and  a more  traditional 
view  of  experimentation.  Traditionally  an  experiment  is  performed  to  test 
for  the  existence  of  an  "effect"  when  a single  factor  in  the  environment  is 
changed.  No  explicit  model  is  assumed  and  proper  inference  requires  that  the 
factor  of  interest  be  the  only  one  that  changes  over  the  duration  of  the  ex- 
periment. Otherwise,  the  effect  may  be  "confounded".  Here,  in  contrast,  an 

^In  practice,  confounding  is  very  difficult  to  avoid  in  transportation  experi- 
ments as  the  real  world  environment  in  which  such  experiments  are  conducted 
does  not  permit  ceteris  paribus  conditions. 
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experiment  is  a mechanism  for  learning  the  values  of  parameters  in  a formal 
model.  There  is  no  restriction  on  the  number  of  factors  changing  during  the 
experiment,  whether  by  design  or  otherwise.  Proper  inference  from  the  experi- 
ment requires  only  that  the  changes  that  do  occur  either  be  observed  or,  if 
unobserved,  satisfy  appropriate  statistical  conditions  so  that  the  assumed 
probabilistic  choice  model  continues  to  describe  behavior. 
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7.  DIRECTIONS  FOR  FUTURE  RESEARCH 

A reading  of  this  paper  indicates  that  the  state  of  the  art  of  sample 
design  for  discrete  choice  analysis  is  advanced  in  some  respects  and  primi- 
tive in  others.  It  is  important  to  note  that  many  of  the  results  reported 
here  are  quite  recent  and  that  further  work  will  undoubtedly  resolve  some 
of  the  questions  raised  in  the  report.  Discrete  choice  analysis  is  still 
a quickly  growing  area  of  knowledge  and  future  work  on  sample  design  problems 
will  hopefully  make  more  precise  statements  about  alternative  sampling  strat- 
egies possible.  A large  set  of  useful  directions  for  future  research  may  be 
enumerated. 

With  regard  to  estimation  of  the  attribute  distribution,  research  in 
three  areas  would  seem  particularly  productive.  First,  there  is  a need  for 
a better  understanding  of  the  relative  merits  of  the  various  methods  for 
determining  the  population  shares  F and  of  the  implications  of  using  estimated 
F values  in  estimating  the  attribute  distribution.  Second,  more  formal  sta- 
tistical criteria  for  comparing  alternative  stratified  sample  designs  should 
be  developed.  Third,  ways  to  use  various  forms  of  prior  knowledge  of  the 
attribute  distribution  in  the  estimation  process  should  be  researched.  In 
particular,  while  interested  in  the  current  attribute  distribution,  one  often 
has  available  knowledge  of  this  distribution  at  some  past  time.  Duguay,  Jung, 
and  McFadden  (1976),  have  developed  an  interesting  but  ad  hoc  approach  to  up- 
dating such  past  attribute  distributions  using  available  aggregate  data  on 
current  conditions.  Work  aimed  at  assessing  the  properties  of  their  method 
would  be  useful. 

In  the  area  of  choice  model  estimation,  extensions  of  the  classical  type 
analytical  work  of  Manski  (1978)  and  Monte  Carlo  tests  of  Cosslett  (1978b) 
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would  be  of  some  use.  Of  potentially  greater  value,  however,  would  be  work 

aimed  at  replacing  the  classical  sample  design  framework  assumed  in  the 

existing  discrete  choice  literature  with  a more  powerful  one.  In  particular, 

the  Bayesian  approach  offers  such  a framework.  Adoption  of  this  approach 

* 

would  provide  a means  of  incorporating  prior  information  on  6 into  both  the 
sample  design  and  choice  model  estimation  process.  Additionally,  the  statis- 
tical decision  theory  aspect  of  Bayesian  analysis  offers  a wide  variety  of 
sampling  strategies  outside  of  the  stratified  sampling  rules  applied  to  date.^ 

The  field  of  experimentation  offers  some  of  the  most  interesting  challenges 
for  future  research.  In  particular,  there  presently  exists  no  theoretical  basis 
for  selection  among  alternative  experimental  designs.  While  we  have  previously 
stated  that  the  experimental  design  problem  seems  similar  to  that  of  sample 
design,  the  exact  relation  between  the  two  problems  is  not  clear.  A second 
important  issue  in  experimentation  regards  the  selection  of  a duration  for  the 
experiment.  Consideration  of  this  question  ultimately  leads  one  to  be  con- 
cerned with  the  dynamics  of  choice  behavior  and  thus  beyond  the  static  dis- 
crete choice  framework  assumed  in  this  paper. 

As  a final  direction  for  future  work,  recall  that  the  attribute  space  Z 
is  assumed  a priori  defined  in  this  paper  but  that  its  structure  is  actually 
under  the  control  of  the  analyst  through  his  decisions  to  collect  data  on 
some  attributes  but  not  others.  Research  aimed  at  providing  guidance  to  aid 
the  analyst  in  these  decisions  could  prove  quite  useful. 


For  an  extensive  treatment  of  the  Bayesian  approach,  see  DeGroot  (1970). 
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Report  of  Inventions  Appendix 


Although  a diligent  review  of  the  work  performed  under  this 
contract  has  revealed  that  no  new  innovation,  discovery,  or  inven- 
tion of  a patentable  nature  was  made,  this  report  summarizes  recent 
advances  in  the  theory  of  sample  design  for  discrete  choice  analysis 
and  presents  some  theoretical  results  and  practical  guidelines  which 
are  new  and  have  not  been  previously  reported.  For  example.  Section 
4 on  sample  design  for  description  of  the  attribute  distribution  con- 
cludes with  some  heuristic  guidance  for  selecting  among  alternative 
sampling  strategies.  Section  5 presents  guidance  on  sample  design  for 
choice  model  estimation  based  on  new  analytical  results  and  the  find- 
ings of  recent  Monte  Carlo  experiments.  Section  6 presents  some  novel 
ideas  on  the  role  and  design  of  experiments  to  learn  about  the  values 
of  probabilistic  choice  model  estimation. 
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